1999 | Laurie J. Heyer, Semyon Kruglyak, and Shibu Yooseph
This paper presents a systematic approach for analyzing gene expression data, specifically focusing on identifying and analyzing coexpressed genes in yeast cell cycle data. The authors describe a set of analytical tools and their application to yeast cell cycle data. The approach includes a similarity measure that reduces false positives, a new clustering algorithm for grouping gene expression patterns, and an interactive graphical cluster analysis tool for user feedback and validation. The clusters generated by the algorithm are used to summarize genome-wide expression and to initiate supervised clustering of genes into biologically meaningful groups.
The study uses oligonucleotide arrays and cDNA microarrays to measure the expression levels of thousands of genes in parallel. These technologies have enabled the exploration of various biological processes, including the yeast cell cycle. The yeast Saccharomyces cerevisiae is an excellent model organism for such experiments due to its sequenced genome and well-characterized ORFs. The study analyzed data from a cell cycle experiment that involved 17 time points, with mRNA isolated from samples taken at 10-minute intervals. The data were processed to remove control sequences and ORFs with potential issues in probe design. After filtering, 4169 ORFs were left at 16 time points.
The authors developed a similarity measure called jackknife correlation, which is robust to single outliers and reduces false positives. They then applied a clustering algorithm, QT_Clust, to group ORFs into clusters with a quality guarantee. The algorithm ensures that clusters have a diameter not exceeding a given threshold, ensuring that all ORFs in a cluster have a high jackknife correlation. The algorithm was applied to the filtered data, resulting in 24 large clusters that represent different expression patterns.
The clusters were analyzed to identify biologically meaningful groups of genes. The authors demonstrated the effectiveness of their method by identifying cell cycle-regulated genes and genes related to specific gene families. They also showed how the method can be used to identify potential gene candidates controlled by a common regulatory system by analyzing the chromosomal location of ORFs with high jackknife correlation.
The study concludes that the proposed method provides a systematic approach for analyzing gene expression data, with the ability to identify coexpressed genes and their biological significance. The method is robust to outliers and provides a quality guarantee for clusters, making it a valuable tool for further study of gene function and regulation.This paper presents a systematic approach for analyzing gene expression data, specifically focusing on identifying and analyzing coexpressed genes in yeast cell cycle data. The authors describe a set of analytical tools and their application to yeast cell cycle data. The approach includes a similarity measure that reduces false positives, a new clustering algorithm for grouping gene expression patterns, and an interactive graphical cluster analysis tool for user feedback and validation. The clusters generated by the algorithm are used to summarize genome-wide expression and to initiate supervised clustering of genes into biologically meaningful groups.
The study uses oligonucleotide arrays and cDNA microarrays to measure the expression levels of thousands of genes in parallel. These technologies have enabled the exploration of various biological processes, including the yeast cell cycle. The yeast Saccharomyces cerevisiae is an excellent model organism for such experiments due to its sequenced genome and well-characterized ORFs. The study analyzed data from a cell cycle experiment that involved 17 time points, with mRNA isolated from samples taken at 10-minute intervals. The data were processed to remove control sequences and ORFs with potential issues in probe design. After filtering, 4169 ORFs were left at 16 time points.
The authors developed a similarity measure called jackknife correlation, which is robust to single outliers and reduces false positives. They then applied a clustering algorithm, QT_Clust, to group ORFs into clusters with a quality guarantee. The algorithm ensures that clusters have a diameter not exceeding a given threshold, ensuring that all ORFs in a cluster have a high jackknife correlation. The algorithm was applied to the filtered data, resulting in 24 large clusters that represent different expression patterns.
The clusters were analyzed to identify biologically meaningful groups of genes. The authors demonstrated the effectiveness of their method by identifying cell cycle-regulated genes and genes related to specific gene families. They also showed how the method can be used to identify potential gene candidates controlled by a common regulatory system by analyzing the chromosomal location of ORFs with high jackknife correlation.
The study concludes that the proposed method provides a systematic approach for analyzing gene expression data, with the ability to identify coexpressed genes and their biological significance. The method is robust to outliers and provides a quality guarantee for clusters, making it a valuable tool for further study of gene function and regulation.