20 April 2011; Revised 12 June 2011; Accepted 16 June 2011 | Rhoda J. Kinsella1*, Andreas Kähäri1, Syed Haider2, Jorge Zamora1, Glenn Proctor1, Giulietta Spudich1, Jeff Almeida-King1, Daniel Staines1, Paul Derwent1, Arnaud Kerhornou1, Paul Kersey1 and Paul Flicek1,*
The article provides a comprehensive overview of the Ensembl and Ensembl Genomes BioMarts, which are data retrieval hubs for genomic data across various taxonomic domains. The Ensembl project, launched in 2000, offers high-quality genomic resources including gene annotations, sequence alignments, and variation data for 56 species, with a focus on chordates and model organisms. The Ensembl Genomes project, initiated in 2009, extends this coverage to five additional domains: protists, bacteria, fungi, plants, and metazoa, representing 313 non-vertebrate species.
The BioMarts, built using the BioMart data management system, provide a flexible and fast means of querying these data. They include seven databases (three hidden and four visible) that cover gene annotations, variation data, regulatory information, and other genomic features. The BioMarts are accessible via web interfaces, APIs, and MySQL servers, and are integrated with other bioinformatics resources like InterPro, dbSNP, and Reactome.
The article also presents several query examples demonstrating the utility of the BioMarts, such as finding human genes coding for specific protein domains, identifying genomic regions affected by structural variations, and retrieving sequence information for genes of interest. Future plans include incorporating new data types and species into the BioMarts and transitioning to the new BioMart 0.8 code.The article provides a comprehensive overview of the Ensembl and Ensembl Genomes BioMarts, which are data retrieval hubs for genomic data across various taxonomic domains. The Ensembl project, launched in 2000, offers high-quality genomic resources including gene annotations, sequence alignments, and variation data for 56 species, with a focus on chordates and model organisms. The Ensembl Genomes project, initiated in 2009, extends this coverage to five additional domains: protists, bacteria, fungi, plants, and metazoa, representing 313 non-vertebrate species.
The BioMarts, built using the BioMart data management system, provide a flexible and fast means of querying these data. They include seven databases (three hidden and four visible) that cover gene annotations, variation data, regulatory information, and other genomic features. The BioMarts are accessible via web interfaces, APIs, and MySQL servers, and are integrated with other bioinformatics resources like InterPro, dbSNP, and Reactome.
The article also presents several query examples demonstrating the utility of the BioMarts, such as finding human genes coding for specific protein domains, identifying genomic regions affected by structural variations, and retrieving sequence information for genes of interest. Future plans include incorporating new data types and species into the BioMarts and transitioning to the new BioMart 0.8 code.