[slides and audio] Reorganizing the protein space at the Universal Protein Resource (UniProt)

The UniProt Consortium has reorganized the protein sequence space to improve data accessibility and usability. UniProt provides a comprehensive, stable, and well-annotated protein sequence database with extensive cross-references and querying interfaces. It consists of four main components: the UniProt Archive (UniParc), the UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters (UniRef), and the UniProt Metagenomic and Environmental Sequences database (UniMES). The reorganization aims to provide a more efficient and accurate representation of the sequence space, allowing users to navigate and search through a growing number of proteome sequences. The UniProtKB core subset is designed to include the most relevant and well-annotated sequences, while redundant sequences are kept in a non-core subset. Complete proteomes are defined as the entire set of proteins expressed by a specific organism, and are based on genome translations or high-quality cDNA sequences. Reference proteomes are selected to provide broad coverage of the tree of life and represent a cross-section of taxonomic diversity. Representative proteomes are computationally derived to best represent sequence space and annotation. New biocuration pages on the UniProt website provide detailed information on the manual curation process, including the integration and interpretation of data from various sources. The UniProt website also offers tools for searching and retrieving data, including full-text and field-based text search, sequence similarity search, and database identifier mapping. The site provides various download formats and supports programmatic access via HTTP requests. UniProt has also developed a Biomart interface for integrated querying of biological data resources. The UniProt Consortium is committed to improving the accuracy and representation of its databases and services, and encourages user feedback. Funding for UniProt comes from various sources, including the National Institutes of Health, the European Commission, and the British Heart Foundation. The UniProt website is freely available for both commercial and non-commercial use.The UniProt Consortium has reorganized the protein sequence space to improve data accessibility and usability. UniProt provides a comprehensive, stable, and well-annotated protein sequence database with extensive cross-references and querying interfaces. It consists of four main components: the UniProt Archive (UniParc), the UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters (UniRef), and the UniProt Metagenomic and Environmental Sequences database (UniMES). The reorganization aims to provide a more efficient and accurate representation of the sequence space, allowing users to navigate and search through a growing number of proteome sequences. The UniProtKB core subset is designed to include the most relevant and well-annotated sequences, while redundant sequences are kept in a non-core subset. Complete proteomes are defined as the entire set of proteins expressed by a specific organism, and are based on genome translations or high-quality cDNA sequences. Reference proteomes are selected to provide broad coverage of the tree of life and represent a cross-section of taxonomic diversity. Representative proteomes are computationally derived to best represent sequence space and annotation. New biocuration pages on the UniProt website provide detailed information on the manual curation process, including the integration and interpretation of data from various sources. The UniProt website also offers tools for searching and retrieving data, including full-text and field-based text search, sequence similarity search, and database identifier mapping. The site provides various download formats and supports programmatic access via HTTP requests. UniProt has also developed a Biomart interface for integrated querying of biological data resources. The UniProt Consortium is committed to improving the accuracy and representation of its databases and services, and encourages user feedback. Funding for UniProt comes from various sources, including the National Institutes of Health, the European Commission, and the British Heart Foundation. The UniProt website is freely available for both commercial and non-commercial use.

Reorganizing the protein space at the Universal Protein Resource (UniProt)

2012 | The UniProt Consortium