Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation

Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation

2016 | Nuala A. O'Leary, Mathew W. Wright, J. Rodney Brister, Stacy Ciufo, Diana Haddad, Rich McVeigh, Bhanu Rajput, Barbara Robbertse, Brian Smith-White, Danso Ako-Adjei, Alexander Astashyn, Azat Badretdin, Yiming Bao, Olga Blinkova, Vyacheslav Brover, Vyacheslav Chetvernin, Jinna Choi, Eric Cox, Olga Ermolaeva, Catherine M. Farrell, Tamara Goldfarb, Tripti Gupta, Daniel Haft, Eneida Hatcher, Wratko Hlavnina, Vinita S. Joardar, Vamsi K. Kodali, Wenjun Li, Donna Maglott, Patrick Masterson, Kelly M. McGarvey, Michael R. Murphy, Kathleen O'Neill, Shashikant Pujar, Sanjida H. Rangwala, Daniel Rausch, Lillian D. Riddick, Conrad Schoch, Andrei Shkeda, Susan S. Storz, Hanzhen Sun, Francoise Thibaud-Nissen, Igor Tolstoy, Raymond E. Tully, Anjana R. Vatsan, Craig Wallin, David Webb, Wendy Wu, Melissa J. Landrum, Avi Kimchi, Tatiana Tatusova, Michael DiCuccio, Paul Kitts, Terence D. Murphy and Kim D. Pruitt
The RefSeq database at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records. The RefSeq project leverages data from the International Nucleotide Sequence Database Collaboration (INSDC) and combines computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. These sequences are augmented with current knowledge, including publications, functional features, and informative nomenclature. The database currently represents sequences from over 55,000 organisms, including more than 4800 viruses, 40,000 prokaryotes, and 10,000 eukaryotes. The RefSeq project has expanded the depth and breadth of taxa included in the dataset through improvements to annotation pipelines. The RefSeq dataset includes sequences from a wide range of organisms, including viruses, prokaryotes, and eukaryotes, and is used for various applications such as taxonomic validation, genome annotation, comparative genomics, and clinical testing. The RefSeq project also highlights diverse functional curation initiatives that support multiple uses of RefSeq data. The RefSeq dataset is generated through different methods depending on the sequence class and organism. The RefSeq project has made significant improvements to data access and has expanded the taxonomic representation of the collection. The RefSeq dataset is accessed through various methods, including NCBI's Nucleotide and Protein databases, BLAST databases, and file transfer protocol (FTP). The RefSeq dataset has grown significantly in recent years, with over 77 million sequence records for more than 55,000 organisms in RefSeq FTP release 71. The RefSeq project continues to expand the dataset to include more diverse organisms and has made improvements to data access and functional annotation. The RefSeq project also highlights efforts to further expand the taxonomic representation of the collection and to provide phylogenetically useful datasets. The RefSeq project has made significant progress in expanding the RefSeq dataset to include more diverse organisms and has made improvements to data access and functional annotation. The RefSeq project also highlights efforts to further expand the taxonomic representation of the collection and to provide phylogenetically useful datasets. The RefSeq project has made significant progress in expanding the RefSeq dataset to include more diverse organisms and has made improvements to data access and functional annotation. The RefSeq project also highlights efforts to further expand the taxonomic representation of the collection and to provide phylogenetically useful datasets.The RefSeq database at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records. The RefSeq project leverages data from the International Nucleotide Sequence Database Collaboration (INSDC) and combines computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. These sequences are augmented with current knowledge, including publications, functional features, and informative nomenclature. The database currently represents sequences from over 55,000 organisms, including more than 4800 viruses, 40,000 prokaryotes, and 10,000 eukaryotes. The RefSeq project has expanded the depth and breadth of taxa included in the dataset through improvements to annotation pipelines. The RefSeq dataset includes sequences from a wide range of organisms, including viruses, prokaryotes, and eukaryotes, and is used for various applications such as taxonomic validation, genome annotation, comparative genomics, and clinical testing. The RefSeq project also highlights diverse functional curation initiatives that support multiple uses of RefSeq data. The RefSeq dataset is generated through different methods depending on the sequence class and organism. The RefSeq project has made significant improvements to data access and has expanded the taxonomic representation of the collection. The RefSeq dataset is accessed through various methods, including NCBI's Nucleotide and Protein databases, BLAST databases, and file transfer protocol (FTP). The RefSeq dataset has grown significantly in recent years, with over 77 million sequence records for more than 55,000 organisms in RefSeq FTP release 71. The RefSeq project continues to expand the dataset to include more diverse organisms and has made improvements to data access and functional annotation. The RefSeq project also highlights efforts to further expand the taxonomic representation of the collection and to provide phylogenetically useful datasets. The RefSeq project has made significant progress in expanding the RefSeq dataset to include more diverse organisms and has made improvements to data access and functional annotation. The RefSeq project also highlights efforts to further expand the taxonomic representation of the collection and to provide phylogenetically useful datasets. The RefSeq project has made significant progress in expanding the RefSeq dataset to include more diverse organisms and has made improvements to data access and functional annotation. The RefSeq project also highlights efforts to further expand the taxonomic representation of the collection and to provide phylogenetically useful datasets.
Reach us at info@study.space