Submitted 24 November 2010; Accepted 10 March 2011 | Michele Magrane1,* and UniProt Consortium1,2,3
The UniProt Knowledgebase (UniProtKB) is a central hub for integrated protein data, providing a unified view of protein sequence and functional information. It is produced by the UniProt Consortium, which includes groups from the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB), and the Protein Information Resource (PIR). UniProtKB consists of two sections: UniProtKB/Swiss-Prot, manually curated, and UniProtKB/TreEMBL, automatically generated. As of release 2011_01, UniProtKB contains over 13.5 million entries, with 524,420 in Swiss-Prot and 13,069,501 in TreEMBL. The database integrates sequences from various resources, including the International Nucleotide Sequence Database Collaboration (INSDC) and other databases like Ensembl and RefSeq. UniProtKB also incorporates extensive cross-references to over 120 external databases, ensuring comprehensive and consistent data. The consortium uses both manual and automatic annotation procedures to add data, with a focus on high-quality information. Manual curation involves critical review of experimental data and literature, while automatic annotation uses rules and decision trees to predict protein properties. UniProtKB provides a range of tools for querying and analyzing data, including text searches, sequence similarity searches, and multiple sequence alignments. The data is released every 4 weeks in various formats to facilitate integration with other databases. Future plans include improving automatic annotation systems, expanding cross-references, and incorporating additional data sources such as variant data and protein-protein interactions.The UniProt Knowledgebase (UniProtKB) is a central hub for integrated protein data, providing a unified view of protein sequence and functional information. It is produced by the UniProt Consortium, which includes groups from the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB), and the Protein Information Resource (PIR). UniProtKB consists of two sections: UniProtKB/Swiss-Prot, manually curated, and UniProtKB/TreEMBL, automatically generated. As of release 2011_01, UniProtKB contains over 13.5 million entries, with 524,420 in Swiss-Prot and 13,069,501 in TreEMBL. The database integrates sequences from various resources, including the International Nucleotide Sequence Database Collaboration (INSDC) and other databases like Ensembl and RefSeq. UniProtKB also incorporates extensive cross-references to over 120 external databases, ensuring comprehensive and consistent data. The consortium uses both manual and automatic annotation procedures to add data, with a focus on high-quality information. Manual curation involves critical review of experimental data and literature, while automatic annotation uses rules and decision trees to predict protein properties. UniProtKB provides a range of tools for querying and analyzing data, including text searches, sequence similarity searches, and multiple sequence alignments. The data is released every 4 weeks in various formats to facilitate integration with other databases. Future plans include improving automatic annotation systems, expanding cross-references, and incorporating additional data sources such as variant data and protein-protein interactions.