[slides and audio] UniProt Knowledgebase%3A a hub of integrated protein data

The UniProt Knowledgebase (UniProtKB) is a central hub for protein sequence and functional information, integrating data from multiple sources to provide a unified view of protein knowledge. It is maintained by the UniProt Consortium, which includes the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB), and the Protein Information Resource (PIR). UniProtKB consists of two sections: UniProtKB/Swiss-Prot, which contains manually curated entries, and UniProtKB/TrEMBL, which contains automatically generated entries. The database includes over 13.5 million entries, with 524,420 in Swiss-Prot and 13,069,501 in TrEMBL. UniProtKB integrates sequences from various sources, including the International Nucleotide Sequence Database Collaboration (INSDC), and accepts directly sequenced protein sequences through the SPIN submission tool. It also searches published literature to identify unsubmitted peptide sequences. Collaborations with PDBe, Ensembl, and RefSeq ensure comprehensive coverage of protein sequences. The International Protein Index (IPI) provides proteome sets for various species, and UniProtKB works with Ensembl to provide complete proteome sets. UniProtKB adds value to each protein sequence record by including detailed information on function, structure, interactions, and sequence features. This information is derived from manual curation and automatic annotation. Manual curation involves verifying sequences, reviewing experimental data, and compiling information into concise reports. Automatic annotation uses tools like UniRule and SAAS to predict protein features based on known proteins. UniProtKB provides cross-references to over 120 databases, ensuring access to complementary information. It also integrates data from other resources, such as Ensembl and RefSeq, to ensure comprehensive coverage. The database supports a wide range of data formats and provides tools for querying and analyzing data. It also includes GO terms for gene ontology annotations and allows users to trace the origin of each piece of information through evidence attribution. UniProtKB is updated every four weeks and is freely available online. It supports programmatic access via HTTP requests and provides a Java API for remote access. The database also includes BioMart for integrated querying across multiple biological data resources. Future plans include expanding data sources, improving automatic annotation systems, and enhancing data integration with other databases. The UniProt Consortium continues to ensure data quality and consistency through collaboration with external resources.The UniProt Knowledgebase (UniProtKB) is a central hub for protein sequence and functional information, integrating data from multiple sources to provide a unified view of protein knowledge. It is maintained by the UniProt Consortium, which includes the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB), and the Protein Information Resource (PIR). UniProtKB consists of two sections: UniProtKB/Swiss-Prot, which contains manually curated entries, and UniProtKB/TrEMBL, which contains automatically generated entries. The database includes over 13.5 million entries, with 524,420 in Swiss-Prot and 13,069,501 in TrEMBL. UniProtKB integrates sequences from various sources, including the International Nucleotide Sequence Database Collaboration (INSDC), and accepts directly sequenced protein sequences through the SPIN submission tool. It also searches published literature to identify unsubmitted peptide sequences. Collaborations with PDBe, Ensembl, and RefSeq ensure comprehensive coverage of protein sequences. The International Protein Index (IPI) provides proteome sets for various species, and UniProtKB works with Ensembl to provide complete proteome sets. UniProtKB adds value to each protein sequence record by including detailed information on function, structure, interactions, and sequence features. This information is derived from manual curation and automatic annotation. Manual curation involves verifying sequences, reviewing experimental data, and compiling information into concise reports. Automatic annotation uses tools like UniRule and SAAS to predict protein features based on known proteins. UniProtKB provides cross-references to over 120 databases, ensuring access to complementary information. It also integrates data from other resources, such as Ensembl and RefSeq, to ensure comprehensive coverage. The database supports a wide range of data formats and provides tools for querying and analyzing data. It also includes GO terms for gene ontology annotations and allows users to trace the origin of each piece of information through evidence attribution. UniProtKB is updated every four weeks and is freely available online. It supports programmatic access via HTTP requests and provides a Java API for remote access. The database also includes BioMart for integrated querying across multiple biological data resources. Future plans include expanding data sources, improving automatic annotation systems, and enhancing data integration with other databases. The UniProt Consortium continues to ensure data quality and consistency through collaboration with external resources.

UniProt Knowledgebase: a hub of integrated protein data

2011 | Michele Magrane and UniProt Consortium