2005, Vol. 33, Database issue | Tanya Barrett, Tugba O. Suzek, Dennis B. Troup, Stephen E. Wilhite, Wing-Chi Ngau, Pierre Ledoux, Dmitry Rudnev, Alex E. Lash, Wataru Fujibuchi and Ron Edgar*
The Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) is a comprehensive repository for high-throughput molecular abundance data, primarily gene expression data. As of 2004, GEO holds over 30,000 submissions, representing approximately half a billion individual molecular abundance measurements across more than 100 organisms. The database supports various data types, including microarray-based experiments, non-array-based technologies like SAGE and mass spectrometry proteomics. Recent developments in GEO include user-friendly web-based interfaces that facilitate effective mining and visualization of these data. These tools enable users to explore data from both experiment- and gene-centric perspectives, making it accessible to those without computational or microarray expertise. The database is organized into Platform, Sample, and Series entities, with each entity assigned a unique accession number. Data can be submitted through interactive web forms, Simple Omnibus Format (SOFT), or FTP in MAGE-ML format. GEO DataSets (GDS) provide a coherent synopsis of experiments, while GEO Profiles offer gene-centric views. Retrieval, query, and analysis features include cluster heat maps, query subset comparisons, subset effects, value distribution, and GEO BLAST for sequence similarity searches. The integration with other NCBI resources enhances the utility of GEO data for biological research. Future plans include further development of submission and retrieval formats, enhanced data visualization, and integration with additional data types.The Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) is a comprehensive repository for high-throughput molecular abundance data, primarily gene expression data. As of 2004, GEO holds over 30,000 submissions, representing approximately half a billion individual molecular abundance measurements across more than 100 organisms. The database supports various data types, including microarray-based experiments, non-array-based technologies like SAGE and mass spectrometry proteomics. Recent developments in GEO include user-friendly web-based interfaces that facilitate effective mining and visualization of these data. These tools enable users to explore data from both experiment- and gene-centric perspectives, making it accessible to those without computational or microarray expertise. The database is organized into Platform, Sample, and Series entities, with each entity assigned a unique accession number. Data can be submitted through interactive web forms, Simple Omnibus Format (SOFT), or FTP in MAGE-ML format. GEO DataSets (GDS) provide a coherent synopsis of experiments, while GEO Profiles offer gene-centric views. Retrieval, query, and analysis features include cluster heat maps, query subset comparisons, subset effects, value distribution, and GEO BLAST for sequence similarity searches. The integration with other NCBI resources enhances the utility of GEO data for biological research. Future plans include further development of submission and retrieval formats, enhanced data visualization, and integration with additional data types.