[slides] NCBI GEO%3A mining millions of expression profiles%E2%80%94database and tools

The Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) is the largest public repository for high-throughput molecular abundance data, primarily gene expression data. It contains over 30,000 submissions representing approximately half a billion molecular abundance measurements across over 100 organisms. The database allows submission, storage, and retrieval of various data types, including microarray-based experiments, SAGE, and mass spectrometry proteomic data. Recent developments include user-friendly web-based interfaces for data mining and visualization, enabling effective exploration of experiments and gene expression profiles. The database is publicly accessible at http://www.ncbi.nlm.nih.gov/geo. GEO's architecture includes three main entity types: Platform, Sample, and Series. Platform describes elements assayed, Sample references a Platform and describes abundance measurements, and Series brings together related Samples. Each entity has a unique accession number. Data are stored as text objects, allowing flexibility for new technologies. Recent enhancements include supplementary metadata fields for MIAME-compliant submissions and acceptance of raw data. Submissions can be made via interactive web forms, SOFT format, or FTP in MAGE_ML format. Data may remain private for several months. GEO DataSets (GDS) are curated to provide a coherent synopsis of experiments, and are used for downstream data mining. Each GDS contains multiple gene expression profiles (GEO Profiles), which are gene-centric and indexed for querying. GEO DataSets and Profiles are integrated with other NCBI databases, providing links to GenBank, PubMed, Gene, UniGene, OMIM, Homologene, SNP, Taxonomy, SAGEMap, and MapViewer. Users can query GEO DataSets and Profiles using Boolean phrases, searching by experimental variables, technology type, author, organism, or keywords. GEO Profiles are annotated with Entrez Gene and UniGene resources, and can be queried for gene names, GenBank accessions, SAGE tags, etc. Supplementary tools include cluster heat maps, Query subset A versus B, Subset effects, Value distribution, and GEO BLAST. These tools facilitate data mining, visualization, and analysis. GEO provides a valuable resource for gene expression studies, with future plans for continued development of submission and retrieval formats, integration with NCBI resources, and enhancements to data visualization and mining features. The database supports research by providing insights into gene function, genetic networks, and cross-comparison of datasets. It is an essential resource for the scientific community, offering a comprehensive collection of gene expression data.The Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) is the largest public repository for high-throughput molecular abundance data, primarily gene expression data. It contains over 30,000 submissions representing approximately half a billion molecular abundance measurements across over 100 organisms. The database allows submission, storage, and retrieval of various data types, including microarray-based experiments, SAGE, and mass spectrometry proteomic data. Recent developments include user-friendly web-based interfaces for data mining and visualization, enabling effective exploration of experiments and gene expression profiles. The database is publicly accessible at http://www.ncbi.nlm.nih.gov/geo. GEO's architecture includes three main entity types: Platform, Sample, and Series. Platform describes elements assayed, Sample references a Platform and describes abundance measurements, and Series brings together related Samples. Each entity has a unique accession number. Data are stored as text objects, allowing flexibility for new technologies. Recent enhancements include supplementary metadata fields for MIAME-compliant submissions and acceptance of raw data. Submissions can be made via interactive web forms, SOFT format, or FTP in MAGE_ML format. Data may remain private for several months. GEO DataSets (GDS) are curated to provide a coherent synopsis of experiments, and are used for downstream data mining. Each GDS contains multiple gene expression profiles (GEO Profiles), which are gene-centric and indexed for querying. GEO DataSets and Profiles are integrated with other NCBI databases, providing links to GenBank, PubMed, Gene, UniGene, OMIM, Homologene, SNP, Taxonomy, SAGEMap, and MapViewer. Users can query GEO DataSets and Profiles using Boolean phrases, searching by experimental variables, technology type, author, organism, or keywords. GEO Profiles are annotated with Entrez Gene and UniGene resources, and can be queried for gene names, GenBank accessions, SAGE tags, etc. Supplementary tools include cluster heat maps, Query subset A versus B, Subset effects, Value distribution, and GEO BLAST. These tools facilitate data mining, visualization, and analysis. GEO provides a valuable resource for gene expression studies, with future plans for continued development of submission and retrieval formats, integration with NCBI resources, and enhancements to data visualization and mining features. The database supports research by providing insights into gene function, genetic networks, and cross-comparison of datasets. It is an essential resource for the scientific community, offering a comprehensive collection of gene expression data.

NCBI GEO: mining millions of expression profiles—database and tools

2005 | Tanya Barrett, Tugba O. Suzek, Dennis B. Troup, Stephen E. Wilhite, Wing-Chi Ngau, Pierre Ledoux, Dmitry Rudnev, Alex E. Lash, Wataru Fujibuchi and Ron Edgar