Understanding GoMiner%3A a resource for biological interpretation of genomic and proteomic data

GoMiner is a freely available program package that organizes lists of 'interesting' genes for biological interpretation in the context of the Gene Ontology (GO). It provides quantitative and statistical output files and two useful visualizations: a tree-like structure and a compact, dynamically interactive directed acyclic graph (DAG). Genes displayed in GoMiner are linked to major public bioinformatics resources. Gene-expression profiling and other high-throughput genomic and proteomic studies are revolutionizing biology. However, these technologies pose new challenges, particularly in biological interpretation. GoMiner addresses this challenge by incorporating the hierarchical structure of the GO to automate the functional categorization of gene lists. It allows users to input lists of under- and overexpressed genes and all genes on the array, and then calculate enrichment or depletion of categories with genes that have changed expression. GoMiner takes as input two lists of genes: the total set on the array and the subset that the user flags as interesting. It displays the genes within the framework of the GO hierarchy, both as a DAG and as a tree structure. Each category is annotated to reflect the number of genes from the user's experiment assigned to that category plus the number assigned to its progeny categories. The user can designate each gene within the 'interesting gene' list as exhibiting under- or overexpression, and genes displayed in the tree-like view are tagged with green down-arrows or red up-arrows, respectively. The most important parameter for interpretation is the enrichment (or depletion) of a category with respect to flagged genes. This parameter is discussed in detail in the section on 'Statistical considerations'. GoMiner also provides a second visualization, a DAG programmed as a scalable vector graphic (SVG) that can be navigated fluently. Any of its nodes can be moused-over to list the flagged genes or clicked to highlight multiple pathways connecting it to the root. Detailed quantitative and statistical results are downloadable in several tab-delimited formats. GoMiner is based on a variety of open-source Java classes and developer tools, plus substantial in-house custom software engineering. It is a client-server application that allows users to interact with a server-side database through JDBC. The primary client-user GUI, written using the Java Swing API, takes the form of a three-panel window in which the user can inspect GO categories and genes. GoMiner also provides a command-line version for high-throughput applications and integration with other programs. The heart of GoMiner is its processing engine, which parses input gene lists and retrieves database entries for association with GO categories. The GO categories and gene associations are stored in a relational database. To enhance the speed of data manipulation, information is modeled in memory using a DAG data structure. GoMiner is flexible because it is coded in Java to be platform-independent and because it can accommodate either the default GO hierarchy or customized versions. The default is the GO Consortium's database of categories and gene associations as implemented onGoMiner is a freely available program package that organizes lists of 'interesting' genes for biological interpretation in the context of the Gene Ontology (GO). It provides quantitative and statistical output files and two useful visualizations: a tree-like structure and a compact, dynamically interactive directed acyclic graph (DAG). Genes displayed in GoMiner are linked to major public bioinformatics resources. Gene-expression profiling and other high-throughput genomic and proteomic studies are revolutionizing biology. However, these technologies pose new challenges, particularly in biological interpretation. GoMiner addresses this challenge by incorporating the hierarchical structure of the GO to automate the functional categorization of gene lists. It allows users to input lists of under- and overexpressed genes and all genes on the array, and then calculate enrichment or depletion of categories with genes that have changed expression. GoMiner takes as input two lists of genes: the total set on the array and the subset that the user flags as interesting. It displays the genes within the framework of the GO hierarchy, both as a DAG and as a tree structure. Each category is annotated to reflect the number of genes from the user's experiment assigned to that category plus the number assigned to its progeny categories. The user can designate each gene within the 'interesting gene' list as exhibiting under- or overexpression, and genes displayed in the tree-like view are tagged with green down-arrows or red up-arrows, respectively. The most important parameter for interpretation is the enrichment (or depletion) of a category with respect to flagged genes. This parameter is discussed in detail in the section on 'Statistical considerations'. GoMiner also provides a second visualization, a DAG programmed as a scalable vector graphic (SVG) that can be navigated fluently. Any of its nodes can be moused-over to list the flagged genes or clicked to highlight multiple pathways connecting it to the root. Detailed quantitative and statistical results are downloadable in several tab-delimited formats. GoMiner is based on a variety of open-source Java classes and developer tools, plus substantial in-house custom software engineering. It is a client-server application that allows users to interact with a server-side database through JDBC. The primary client-user GUI, written using the Java Swing API, takes the form of a three-panel window in which the user can inspect GO categories and genes. GoMiner also provides a command-line version for high-throughput applications and integration with other programs. The heart of GoMiner is its processing engine, which parses input gene lists and retrieves database entries for association with GO categories. The GO categories and gene associations are stored in a relational database. To enhance the speed of data manipulation, information is modeled in memory using a DAG data structure. GoMiner is flexible because it is coded in Java to be platform-independent and because it can accommodate either the default GO hierarchy or customized versions. The default is the GO Consortium's database of categories and gene associations as implemented on

GoMiner: a resource for biological interpretation of genomic and proteomic data

25 March 2003 | Barry R Zeeberg, Weimin Feng†, Geoffrey Wang‡, May D Wang‡, Anthony T Fojo, Margot Sunshine§, Sudarshan Narasimhan§, David W Kane§, William C Reinhold, Samir Lababidi, Kimberly J Bussey, Joseph Riss†, J Carl Barrett† and John N Weinstein

GoMiner: a resource for biological interpretation of genomic and proteomic data

25 March 2003 | Barry R Zeeberg*, Weimin Feng†, Geoffrey Wang‡, May D Wang‡, Anthony T Fojo*, Margot Sunshine§, Sudarshan Narasimhan§, David W Kane§, William C Reinhold*, Samir Lababidi*, Kimberly J Bussey*, Joseph Riss†, J Carl Barrett† and John N Weinstein*

25 March 2003 | Barry R Zeeberg, Weimin Feng†, Geoffrey Wang‡, May D Wang‡, Anthony T Fojo, Margot Sunshine§, Sudarshan Narasimhan§, David W Kane§, William C Reinhold, Samir Lababidi, Kimberly J Bussey, Joseph Riss†, J Carl Barrett† and John N Weinstein