2010 | Tanya Barrett*, Dennis B. Troup, Stephen E. Wilhite, Pierre Ledoux, Carlos Evangelista, Irene F. Kim, Maxim Tomashevsky, Kimberly A. Marshall, Katherine H. Phillippy, Patti M. Sherman, Rolf N. Muertter, Michelle Holko, Oluwabukunmi Ayanbule, Andrey Yefanov and Alexandra Soboleva
The Gene Expression Omnibus (GEO) database, established by the National Center for Biotechnology Information (NCBI) in 2000, has evolved from a repository for high-throughput gene expression data to a comprehensive archive for diverse functional genomics data. Over the past decade, GEO has adapted to accommodate various data types, including microarray and next-generation sequencing data. It now hosts over 20,000 studies, comprising more than 500,000 samples, and continues to be the primary source for high-throughput data submissions. The database provides tools for searching, browsing, downloading, and visualizing data at the level of individual genes or entire studies. Recent enhancements include improved search and data representation tools, as well as a review of how the community uses GEO data. GEO is freely accessible at http://www.ncbi.nlm.nih.gov/geo/.
GEO's database structure includes primary records (Platform, Sample, Series) and secondary curated DataSets and Profiles. The primary database allows for flexible storage of diverse data types, while the secondary databases standardize and process data for analysis. Submission procedures include various formats, such as GEOarchive spreadsheets, web-based forms, and plain text/XML. All submissions are validated for correct structure and content, with curators ensuring data quality. Submitters can update records and keep them private until a manuscript is published.
GEO supports next-generation sequence data for studies on gene expression, genome-protein interactions, methylation, and regulation. It hosts processed and analyzed sequence data alongside metadata, with raw data linked to the Sequence Read Archive (SRA). GEO provides robust tools for querying, browsing, downloading, and programmatic access to data. These tools include advanced search functions, cluster heatmaps, and graphical displays of gene expression patterns. GEO also offers extensive internal and external links to related data and resources.
The community extensively uses GEO data for various applications, including verifying gene expression signatures, incorporating data into their analyses, and developing new statistical algorithms. Despite the diversity of data types and experimental designs, users have performed powerful meta-analyses across thousands of studies. GEO continues to evolve to enhance data integration, cross-comparison, and accessibility for a broad audience. Funding for open access is provided by the National Institutes of Health.The Gene Expression Omnibus (GEO) database, established by the National Center for Biotechnology Information (NCBI) in 2000, has evolved from a repository for high-throughput gene expression data to a comprehensive archive for diverse functional genomics data. Over the past decade, GEO has adapted to accommodate various data types, including microarray and next-generation sequencing data. It now hosts over 20,000 studies, comprising more than 500,000 samples, and continues to be the primary source for high-throughput data submissions. The database provides tools for searching, browsing, downloading, and visualizing data at the level of individual genes or entire studies. Recent enhancements include improved search and data representation tools, as well as a review of how the community uses GEO data. GEO is freely accessible at http://www.ncbi.nlm.nih.gov/geo/.
GEO's database structure includes primary records (Platform, Sample, Series) and secondary curated DataSets and Profiles. The primary database allows for flexible storage of diverse data types, while the secondary databases standardize and process data for analysis. Submission procedures include various formats, such as GEOarchive spreadsheets, web-based forms, and plain text/XML. All submissions are validated for correct structure and content, with curators ensuring data quality. Submitters can update records and keep them private until a manuscript is published.
GEO supports next-generation sequence data for studies on gene expression, genome-protein interactions, methylation, and regulation. It hosts processed and analyzed sequence data alongside metadata, with raw data linked to the Sequence Read Archive (SRA). GEO provides robust tools for querying, browsing, downloading, and programmatic access to data. These tools include advanced search functions, cluster heatmaps, and graphical displays of gene expression patterns. GEO also offers extensive internal and external links to related data and resources.
The community extensively uses GEO data for various applications, including verifying gene expression signatures, incorporating data into their analyses, and developing new statistical algorithms. Despite the diversity of data types and experimental designs, users have performed powerful meta-analyses across thousands of studies. GEO continues to evolve to enhance data integration, cross-comparison, and accessibility for a broad audience. Funding for open access is provided by the National Institutes of Health.