[slides] UniProt%3A the universal protein knowledgebase

UniProt is a comprehensive protein knowledgebase that provides detailed annotations for over 60 million protein sequences. The database includes both manually curated sequences (Swiss-Prot) and automatically annotated sequences (TrEMBL). Swiss-Prot contains over 550,000 sequences curated by experts, while TrEMBL contains over 60 million sequences derived from high-throughput sequencing. UniProt also provides reference proteomes for over 56,000 species, representing a broad range of taxonomic diversity. These reference proteomes are selected based on community consultation or computational clustering and are used for manual and automatic annotation. Additionally, UniProt offers pan proteomes, which include all non-redundant sequences for a group of related organisms, and are used for phylogenetic comparisons and genome evolution studies. UniProt has implemented a pipeline to remove redundant proteomes, reducing the number of sequences in the database by 47 million. This has improved the scalability and usability of the database. The UniProt Archive (UniParc) provides a complete set of known sequences, including historical obsolete sequences. UniProt also provides a SPARQL endpoint for querying the database's 22 billion triples of data. UniProt's manual curation process focuses on high-quality annotations for experimentally characterized proteins, with over 550,000 curated proteins. Post-translational modifications (PTMs) are a key area of curation, as they play a crucial role in protein function and regulation. UniProt has developed a semi-automatic pipeline for integrating high-throughput proteomics data, which adds PTMs from large-scale proteomics publications. UniProt also provides automatic annotation through rule-based systems such as UniRule and SAAS, which use hierarchical InterPro classification for protein family and domain signatures. These systems have significantly improved the coverage and accuracy of annotations. UniProt has also developed tools for mapping proteomic data to UniProt entries, enhancing the integration of experimental data with the database. The UniProt website has been enhanced with new features such as the ProtVista feature viewer, which provides an integrated view of protein features. The website also includes a new 'Publications' view for UniProtKB entries, allowing users to filter and access publications relevant to a protein. Additionally, a 'Peptide search' tool has been introduced to quickly find UniProtKB sequences that match a given peptide sequence. UniProt continues to evolve to meet the needs of the scientific community, providing a reliable and comprehensive resource for protein information. The database is accessible via the website (http://www.uniprot.org/) and includes a SPARQL endpoint for querying its data (http://sparql.uniprot.org/). UniProt encourages the inclusion of UniProt accession numbers in scientific papers to improve the connection between literature and databases.UniProt is a comprehensive protein knowledgebase that provides detailed annotations for over 60 million protein sequences. The database includes both manually curated sequences (Swiss-Prot) and automatically annotated sequences (TrEMBL). Swiss-Prot contains over 550,000 sequences curated by experts, while TrEMBL contains over 60 million sequences derived from high-throughput sequencing. UniProt also provides reference proteomes for over 56,000 species, representing a broad range of taxonomic diversity. These reference proteomes are selected based on community consultation or computational clustering and are used for manual and automatic annotation. Additionally, UniProt offers pan proteomes, which include all non-redundant sequences for a group of related organisms, and are used for phylogenetic comparisons and genome evolution studies. UniProt has implemented a pipeline to remove redundant proteomes, reducing the number of sequences in the database by 47 million. This has improved the scalability and usability of the database. The UniProt Archive (UniParc) provides a complete set of known sequences, including historical obsolete sequences. UniProt also provides a SPARQL endpoint for querying the database's 22 billion triples of data. UniProt's manual curation process focuses on high-quality annotations for experimentally characterized proteins, with over 550,000 curated proteins. Post-translational modifications (PTMs) are a key area of curation, as they play a crucial role in protein function and regulation. UniProt has developed a semi-automatic pipeline for integrating high-throughput proteomics data, which adds PTMs from large-scale proteomics publications. UniProt also provides automatic annotation through rule-based systems such as UniRule and SAAS, which use hierarchical InterPro classification for protein family and domain signatures. These systems have significantly improved the coverage and accuracy of annotations. UniProt has also developed tools for mapping proteomic data to UniProt entries, enhancing the integration of experimental data with the database. The UniProt website has been enhanced with new features such as the ProtVista feature viewer, which provides an integrated view of protein features. The website also includes a new 'Publications' view for UniProtKB entries, allowing users to filter and access publications relevant to a protein. Additionally, a 'Peptide search' tool has been introduced to quickly find UniProtKB sequences that match a given peptide sequence. UniProt continues to evolve to meet the needs of the scientific community, providing a reliable and comprehensive resource for protein information. The database is accessible via the website (http://www.uniprot.org/) and includes a SPARQL endpoint for querying its data (http://sparql.uniprot.org/). UniProt encourages the inclusion of UniProt accession numbers in scientific papers to improve the connection between literature and databases.

UniProt: the universal protein knowledgebase

2017 | The UniProt Consortium