December 1998 | MICHAEL B. EISEN*, PAUL T. SPELLMAN*, PATRICK O. BROWN†, AND DAVID BOTSTEIN*‡
This paper describes a system for cluster analysis of genome-wide expression data from DNA microarray hybridization. The method uses standard statistical algorithms to group genes based on similarity in their expression patterns. The output is displayed graphically, allowing biologists to interpret the clustering and underlying expression data simultaneously. The authors found that clustering gene expression data groups together genes with known similar functions, and a similar tendency is observed in human data. This suggests that patterns seen in genome-wide expression experiments can indicate the status of cellular processes. Coexpression of genes with known functions and poorly characterized genes may provide insights into the functions of many genes.
The rapid advancement of genome-scale sequencing has driven the development of methods to exploit this information by characterizing biological processes in new ways. The knowledge of the coding sequences of virtually every gene in an organism invites the development of technology to study the expression of all genes at once. Various techniques have evolved to monitor transcript abundance for all genes of an organism. The paper addresses the problem of analyzing and presenting information on this genomic scale.
A natural first step in extracting this information is to examine the extremes, such as genes with significant differential expression. However, such analyses do not fully utilize the potential of genome-scale experiments to alter our understanding of cellular biology. Instead, a holistic approach to analyzing genomic data is needed to illuminate order in the entire set of observations.
A natural basis for organizing gene expression data is to group genes with similar expression patterns. The authors used a correlation coefficient as a measure of similarity, which captures similarity in "shape" but not magnitude. Clustering methods can be divided into supervised and unsupervised. The authors favored unsupervised methods or hybrid approaches due to limited prior knowledge of gene expression patterns.
The authors used pairwise average-linkage cluster analysis to illustrate their approach. This method is a form of hierarchical clustering, familiar in sequence and phylogenetic analysis. Relationships among genes are represented by a tree, with branch lengths reflecting similarity. The computed trees can be used to order genes in the original data table, allowing biologists to develop an integrated understanding of the process being studied.
The results show that genes with similar functions cluster together, indicating that expression data can organize genes into functional categories. This suggests that gene expression patterns can be used to infer functional relationships between genes. The authors conclude that the clustering approach is useful for analyzing gene expression data and that similar methods may be applied to other large data sets.This paper describes a system for cluster analysis of genome-wide expression data from DNA microarray hybridization. The method uses standard statistical algorithms to group genes based on similarity in their expression patterns. The output is displayed graphically, allowing biologists to interpret the clustering and underlying expression data simultaneously. The authors found that clustering gene expression data groups together genes with known similar functions, and a similar tendency is observed in human data. This suggests that patterns seen in genome-wide expression experiments can indicate the status of cellular processes. Coexpression of genes with known functions and poorly characterized genes may provide insights into the functions of many genes.
The rapid advancement of genome-scale sequencing has driven the development of methods to exploit this information by characterizing biological processes in new ways. The knowledge of the coding sequences of virtually every gene in an organism invites the development of technology to study the expression of all genes at once. Various techniques have evolved to monitor transcript abundance for all genes of an organism. The paper addresses the problem of analyzing and presenting information on this genomic scale.
A natural first step in extracting this information is to examine the extremes, such as genes with significant differential expression. However, such analyses do not fully utilize the potential of genome-scale experiments to alter our understanding of cellular biology. Instead, a holistic approach to analyzing genomic data is needed to illuminate order in the entire set of observations.
A natural basis for organizing gene expression data is to group genes with similar expression patterns. The authors used a correlation coefficient as a measure of similarity, which captures similarity in "shape" but not magnitude. Clustering methods can be divided into supervised and unsupervised. The authors favored unsupervised methods or hybrid approaches due to limited prior knowledge of gene expression patterns.
The authors used pairwise average-linkage cluster analysis to illustrate their approach. This method is a form of hierarchical clustering, familiar in sequence and phylogenetic analysis. Relationships among genes are represented by a tree, with branch lengths reflecting similarity. The computed trees can be used to order genes in the original data table, allowing biologists to develop an integrated understanding of the process being studied.
The results show that genes with similar functions cluster together, indicating that expression data can organize genes into functional categories. This suggests that gene expression patterns can be used to infer functional relationships between genes. The authors conclude that the clustering approach is useful for analyzing gene expression data and that similar methods may be applied to other large data sets.