Understanding The NCBI Taxonomy database

The NCBI Taxonomy database is the standard nomenclature and classification repository for the International Nucleotide Sequence Database Collaboration (INSDC), comprising GenBank, EMBL, and DDBJ. It provides taxonomic information for all organisms represented in the INSDC sequence databases. The database is manually curated by a small group of scientists at NCBI, using current taxonomic literature to maintain a phylogenetic taxonomy. It serves as a central hub for many NCBI resources, enabling clustering, internal linking, and linking to external resources. The NCBI Taxonomy project began in 1991 when the first version of Entrez was developed. Initially, each INSDC partner maintained their own taxonomic nomenclature, leading to inconsistencies. Entrez was the first system to link nucleotide and protein sequences with scientific literature, necessitating a unified taxonomy. The first merged taxonomy was a result of combining taxonomies from different databases, but it was inconsistent. Workshops were held to improve the classification. The INSDC partners later agreed to resolve taxonomic issues before new sequence data was released. This allowed for more accurate taxonomy consultations. The NCBI taxonomy database now includes 234,991 species with formal names and 405,546 with informal names. It includes 11,110 prokaryotic species with formal names and 221,263 eukaryotic species with formal names. The database also includes 95 extinct species. The taxonomy database includes various name types, such as scientific names, synonyms, and informal names. It supports multiple search fields and allows for Boolean queries. The database is stored in an SQL Server relational database called TAXON. Public access is provided through the Taxonomy Browser, the Taxonomy domain of Entrez, and the taxonomy ftp site. The database includes tools for searching and browsing taxonomy, such as the Taxonomy Browser, which provides hierarchy and taxon-specific pages. It also includes the name/id status page and common tree viewer. The taxonomy database is crucial for organizing and linking data across NCBI domains, and it supports various search capabilities, including wild card searches. The database is continuously updated and maintained to reflect current taxonomic knowledge.The NCBI Taxonomy database is the standard nomenclature and classification repository for the International Nucleotide Sequence Database Collaboration (INSDC), comprising GenBank, EMBL, and DDBJ. It provides taxonomic information for all organisms represented in the INSDC sequence databases. The database is manually curated by a small group of scientists at NCBI, using current taxonomic literature to maintain a phylogenetic taxonomy. It serves as a central hub for many NCBI resources, enabling clustering, internal linking, and linking to external resources. The NCBI Taxonomy project began in 1991 when the first version of Entrez was developed. Initially, each INSDC partner maintained their own taxonomic nomenclature, leading to inconsistencies. Entrez was the first system to link nucleotide and protein sequences with scientific literature, necessitating a unified taxonomy. The first merged taxonomy was a result of combining taxonomies from different databases, but it was inconsistent. Workshops were held to improve the classification. The INSDC partners later agreed to resolve taxonomic issues before new sequence data was released. This allowed for more accurate taxonomy consultations. The NCBI taxonomy database now includes 234,991 species with formal names and 405,546 with informal names. It includes 11,110 prokaryotic species with formal names and 221,263 eukaryotic species with formal names. The database also includes 95 extinct species. The taxonomy database includes various name types, such as scientific names, synonyms, and informal names. It supports multiple search fields and allows for Boolean queries. The database is stored in an SQL Server relational database called TAXON. Public access is provided through the Taxonomy Browser, the Taxonomy domain of Entrez, and the taxonomy ftp site. The database includes tools for searching and browsing taxonomy, such as the Taxonomy Browser, which provides hierarchy and taxon-specific pages. It also includes the name/id status page and common tree viewer. The taxonomy database is crucial for organizing and linking data across NCBI domains, and it supports various search capabilities, including wild card searches. The database is continuously updated and maintained to reflect current taxonomic knowledge.

The NCBI Taxonomy database

2012 | Scott Federhen