2007 | Tanya Barrett*, Dennis B. Troup, Stephen E. Wilhite, Pierre Ledoux, Dmitry Rudnev, Carlos Evangelista, Irene F. Kim, Alexandra Soboleva, Maxim Tomashevsky and Ron Edgar
The Gene Expression Omnibus (GEO) is a public database at the National Center for Biotechnology Information (NCBI) that archives and freely distributes microarray and other high-throughput data generated by the scientific community. The database is MIAME-compliant, ensuring that data submissions include fully annotated raw and processed data. GEO provides various data deposit options and formats, including web forms, spreadsheets, XML, and SOFT. It also offers user-friendly web-based interfaces and applications to help users explore, visualize, and download data.
GEO is the largest public gene expression resource, containing over 120,000 samples and over 3.2 billion measurements across more than 200 organisms. Data are freely available online and via bulk FTP download. The database structure is designed for efficient storage and retrieval of large-scale functional genomic data. Data are stored in a relational MSSQL database partitioned into three entity types: Platform, Sample, and Series. Each of these entities is under the submitter's editorial control and is assigned a stable and unique accession number.
GEO defines and creates related data objects to facilitate data mining, visualization, and transposition of submitted data into alternative structures. The principal object used for this purpose is the DataSet object. DataSets allow for the transformation of diverse styles of incoming data from multiple unrelated projects into a standardized format. DataSets provide two discrete renderings of the data: an experiment-centered representation and a gene-centered representation.
GEO supports various submission formats, including SOFT, MINiML, and MAGE-ML. Submitters can choose from several options for data submission, and the final GEO records will look similar and contain equivalent information regardless of the deposit method. All submitted data undergo syntactic validation and are inspected by curators for content integrity. Researchers are responsible for the completeness, quality, and accuracy of their submissions.
GEO provides a range of tools to retrieve, explore, and visualize data. These tools include traditional data reduction techniques and concise displays designed for human scanning. The Entrez search system serves as the basis for most queries, with Entrez GEO DataSets containing experiment-centered data and Entrez GEO Profiles containing gene-centered data. Graphics are an important tool for visualizing and interpreting high-dimensional expression data.
GEO continues to develop data retrieval and mining features, and enhance the user experience. Future plans include improving the rendering and representation of non-gene-expression data types that GEO accepts, such as chromatin-immunoprecipitation on arrays (ChIP-chip) studies, array comparative genomic hybridization (aCGH), SNP arrays, and some proteomic data. The integration of GEO data with extensive sequence, mapping, and bibliographic resources via the NCBI Entrez system of interlinked databases further enhances the value and context of the data.The Gene Expression Omnibus (GEO) is a public database at the National Center for Biotechnology Information (NCBI) that archives and freely distributes microarray and other high-throughput data generated by the scientific community. The database is MIAME-compliant, ensuring that data submissions include fully annotated raw and processed data. GEO provides various data deposit options and formats, including web forms, spreadsheets, XML, and SOFT. It also offers user-friendly web-based interfaces and applications to help users explore, visualize, and download data.
GEO is the largest public gene expression resource, containing over 120,000 samples and over 3.2 billion measurements across more than 200 organisms. Data are freely available online and via bulk FTP download. The database structure is designed for efficient storage and retrieval of large-scale functional genomic data. Data are stored in a relational MSSQL database partitioned into three entity types: Platform, Sample, and Series. Each of these entities is under the submitter's editorial control and is assigned a stable and unique accession number.
GEO defines and creates related data objects to facilitate data mining, visualization, and transposition of submitted data into alternative structures. The principal object used for this purpose is the DataSet object. DataSets allow for the transformation of diverse styles of incoming data from multiple unrelated projects into a standardized format. DataSets provide two discrete renderings of the data: an experiment-centered representation and a gene-centered representation.
GEO supports various submission formats, including SOFT, MINiML, and MAGE-ML. Submitters can choose from several options for data submission, and the final GEO records will look similar and contain equivalent information regardless of the deposit method. All submitted data undergo syntactic validation and are inspected by curators for content integrity. Researchers are responsible for the completeness, quality, and accuracy of their submissions.
GEO provides a range of tools to retrieve, explore, and visualize data. These tools include traditional data reduction techniques and concise displays designed for human scanning. The Entrez search system serves as the basis for most queries, with Entrez GEO DataSets containing experiment-centered data and Entrez GEO Profiles containing gene-centered data. Graphics are an important tool for visualizing and interpreting high-dimensional expression data.
GEO continues to develop data retrieval and mining features, and enhance the user experience. Future plans include improving the rendering and representation of non-gene-expression data types that GEO accepts, such as chromatin-immunoprecipitation on arrays (ChIP-chip) studies, array comparative genomic hybridization (aCGH), SNP arrays, and some proteomic data. The integration of GEO data with extensive sequence, mapping, and bibliographic resources via the NCBI Entrez system of interlinked databases further enhances the value and context of the data.