2012, Vol. 40, Database issue | The UniProt Consortium
The UniProt Consortium, comprising institutions from the UK, Switzerland, the US, and other countries, aims to support biological research by providing a comprehensive and freely accessible protein sequence knowledgebase. UniProt is structured into four main components: the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters, and the UniProt Metagenomic and Environmental Sequence Database. A key development is the provision of complete, reference, and representative proteomes. To manage the exponential increase in sequence data, UniProt is reorganizing its data representation to facilitate optimal use of sequence and functional information. This includes creating a core subset of UniProtKB based on evolving criteria to ensure users find relevant and well-annotated sequences. Complete proteomes are defined as the entire set of proteins expressed by a specific organism, with some including high-quality cDNAs. UniProt also defines reference proteomes and representative proteomes (RPs) to provide broad coverage and reduce redundancy. The consortium has developed automated methods for annotating uncharacterized proteins and has updated its website to include new biocuration pages and interactive workshops to enhance user experience. UniProt offers various tools for data access and querying, including full-text and field-based text search, sequence similarity search, and multiple sequence alignment. The consortium values user feedback and continuously improves its databases and services.The UniProt Consortium, comprising institutions from the UK, Switzerland, the US, and other countries, aims to support biological research by providing a comprehensive and freely accessible protein sequence knowledgebase. UniProt is structured into four main components: the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters, and the UniProt Metagenomic and Environmental Sequence Database. A key development is the provision of complete, reference, and representative proteomes. To manage the exponential increase in sequence data, UniProt is reorganizing its data representation to facilitate optimal use of sequence and functional information. This includes creating a core subset of UniProtKB based on evolving criteria to ensure users find relevant and well-annotated sequences. Complete proteomes are defined as the entire set of proteins expressed by a specific organism, with some including high-quality cDNAs. UniProt also defines reference proteomes and representative proteomes (RPs) to provide broad coverage and reduce redundancy. The consortium has developed automated methods for annotating uncharacterized proteins and has updated its website to include new biocuration pages and interactive workshops to enhance user experience. UniProt offers various tools for data access and querying, including full-text and field-based text search, sequence similarity search, and multiple sequence alignment. The consortium values user feedback and continuously improves its databases and services.