2020 | Conrad L. Schoch*, Stacy Ciufo, Mikhail Domrachev, Carol L. Hotton, Sivakumar Kannan, Rogneda Khovanskaya, Detlef Leipe, Richard Mcveigh, Kathleen O'Neill, Barbara Robbertse, Shobha Sharma, Vladimir Soussov, John P. Sullivan, Lu Sun, Seán Turner and Ilene Karsch-Mizrachi
The National Center for Biotechnology Information (NCBI) Taxonomy is a comprehensive resource that curates organismal names and classifications for nucleotide and protein sequence databases. Since its inception in 1991, it has evolved from a single SQL database to a series of linked databases centered around the NameBank framework. This transition has enhanced the annotation of synonyms, tracking of publications, and improved the handling of scientific authorities and types. The NCBI Taxonomy now includes formal and informal names, with detailed tracking of relationships among data elements. The resource is used by major public sequence databases in the International Nucleotide Sequence Database Collaboration (INSDC) to maintain consistency in taxonomic classification. The article discusses the challenges and solutions in managing taxonomic information, including the use of multiple codes of nomenclature, the documentation of type material, and the curation of specific taxonomic groups such as prokaryotes, green plants, fungi, unicellular eukaryotes, metazoa, and viruses. The NCBI Taxonomy has significantly expanded its coverage of known species, with over 460,000 TaxNodes as of 2020, and continues to evolve to better capture and communicate taxonomic information.The National Center for Biotechnology Information (NCBI) Taxonomy is a comprehensive resource that curates organismal names and classifications for nucleotide and protein sequence databases. Since its inception in 1991, it has evolved from a single SQL database to a series of linked databases centered around the NameBank framework. This transition has enhanced the annotation of synonyms, tracking of publications, and improved the handling of scientific authorities and types. The NCBI Taxonomy now includes formal and informal names, with detailed tracking of relationships among data elements. The resource is used by major public sequence databases in the International Nucleotide Sequence Database Collaboration (INSDC) to maintain consistency in taxonomic classification. The article discusses the challenges and solutions in managing taxonomic information, including the use of multiple codes of nomenclature, the documentation of type material, and the curation of specific taxonomic groups such as prokaryotes, green plants, fungi, unicellular eukaryotes, metazoa, and viruses. The NCBI Taxonomy has significantly expanded its coverage of known species, with over 460,000 TaxNodes as of 2020, and continues to evolve to better capture and communicate taxonomic information.