High-throughput functional annotation and data mining with the Blast2GO suite

High-throughput functional annotation and data mining with the Blast2GO suite

2008 | Stefan Götz, Juan Miguel García-Gómez, Javier Terol, Tim D. Williams, Shivashankar H. Nagaraj, María José Nueda, Montserrat Robles, Joaquín Dopazo and Ana Conesa
The Blast2GO suite is an integrated, biologist-oriented tool for high-throughput and automatic functional annotation of DNA or protein sequences based on the Gene Ontology (GO) vocabulary. It combines various annotation strategies and tools to control the type and intensity of annotation, provides numerous graphical features such as interactive GO-graph visualization for gene-set function profiling or descriptive charts, and includes general sequence management features and high-throughput capabilities. The tool was used to analyze annotation behavior through homology transfer and its impact on functional genomics research. The goal is to provide biologists with useful information to consider when functionally characterizing their sequence data. Blast2GO v.1 was released in 2005 as a biologist-oriented, high-throughput, quality data-mining tool and has been used in various functional annotation projects, mainly for non-model species. It includes automatic GO annotation of EST collections and functional interpretation of Microarray studies, as well as genome comparison studies and general bioinformatics methodology descriptions. Blast2GO v.2 is a comprehensive suite for high-throughput functional annotation and data mining of novel sequences. It includes new application functionalities, provides a deeper understanding of annotation modulation, and gives practical insights into the potentials and risks of automatic annotation used as a discovery tool in the functional genomics study of poorly characterized sequence data. Blast2GO v.2 includes features such as hit coverage filter, BLAST description annotation, GO-slim, Annex, enzyme code annotations and KEGG pathway visualization, InterProScan, manual curation tool, and annotation coherency. It also includes descriptive charts and sequence management features, graphical features such as graph performance, graph colouring and information content, graph term filtering, multilevel pie, and new high-throughput utilities such as pipeline version and high-throughput BLAST. The annotation process involves three basic steps: homologues search, GO term mapping, and actual annotation. The annotation rule computes an annotation score based on sequence similarity and abstraction. The score formula consists of two additive terms, a similarity term and an abstraction term, and considers the GO hierarchy. The similarity term takes into account the sequence similarity to the homologue sequence, modulated by the individual evidence code of its corresponding annotations. The abstraction term multiplies the number of total GOs unified at the parent term by a user-defined GO weight factor. Annotation styles were defined to evaluate the impact of similarity transfer and Blast2GO-specific annotation parameters on annotation results. The considered parameters included the degree of homology through the BLAST e-value cut-off, the sequence similarity-based annotation score, the quality of transferred annotations through evidence code weights, and the intensity of abstraction to parent terms through a GO weight. Additionally, the BLAST versus domain-based (InterPro) transfer and the automatic augmentation through the Annex strategy were included. The evaluation tasks included annotation performance, manual curation, cis annotation, and functional genomics. The results showed that the choice of annotationThe Blast2GO suite is an integrated, biologist-oriented tool for high-throughput and automatic functional annotation of DNA or protein sequences based on the Gene Ontology (GO) vocabulary. It combines various annotation strategies and tools to control the type and intensity of annotation, provides numerous graphical features such as interactive GO-graph visualization for gene-set function profiling or descriptive charts, and includes general sequence management features and high-throughput capabilities. The tool was used to analyze annotation behavior through homology transfer and its impact on functional genomics research. The goal is to provide biologists with useful information to consider when functionally characterizing their sequence data. Blast2GO v.1 was released in 2005 as a biologist-oriented, high-throughput, quality data-mining tool and has been used in various functional annotation projects, mainly for non-model species. It includes automatic GO annotation of EST collections and functional interpretation of Microarray studies, as well as genome comparison studies and general bioinformatics methodology descriptions. Blast2GO v.2 is a comprehensive suite for high-throughput functional annotation and data mining of novel sequences. It includes new application functionalities, provides a deeper understanding of annotation modulation, and gives practical insights into the potentials and risks of automatic annotation used as a discovery tool in the functional genomics study of poorly characterized sequence data. Blast2GO v.2 includes features such as hit coverage filter, BLAST description annotation, GO-slim, Annex, enzyme code annotations and KEGG pathway visualization, InterProScan, manual curation tool, and annotation coherency. It also includes descriptive charts and sequence management features, graphical features such as graph performance, graph colouring and information content, graph term filtering, multilevel pie, and new high-throughput utilities such as pipeline version and high-throughput BLAST. The annotation process involves three basic steps: homologues search, GO term mapping, and actual annotation. The annotation rule computes an annotation score based on sequence similarity and abstraction. The score formula consists of two additive terms, a similarity term and an abstraction term, and considers the GO hierarchy. The similarity term takes into account the sequence similarity to the homologue sequence, modulated by the individual evidence code of its corresponding annotations. The abstraction term multiplies the number of total GOs unified at the parent term by a user-defined GO weight factor. Annotation styles were defined to evaluate the impact of similarity transfer and Blast2GO-specific annotation parameters on annotation results. The considered parameters included the degree of homology through the BLAST e-value cut-off, the sequence similarity-based annotation score, the quality of transferred annotations through evidence code weights, and the intensity of abstraction to parent terms through a GO weight. Additionally, the BLAST versus domain-based (InterPro) transfer and the automatic augmentation through the Annex strategy were included. The evaluation tasks included annotation performance, manual curation, cis annotation, and functional genomics. The results showed that the choice of annotation
Reach us at info@futurestudyspace.com