Published online 27 October 2014 | The UniProt Consortium
UniProt is a comprehensive resource for protein sequence and annotation, which has seen significant growth, doubling its sequence count to 80 million in the past year. To accommodate this growth, UniProt has expanded its accession number format from 6 to 10 characters and introduced a new proteome identifier to track the provenance of sequences. The database now includes manually curated entries in UniProtKB/Swiss-Prot (half a million sequences) and unreviewed sequences in UniProtKB/TrEMBL (80 million sequences). UniProt also offers non-redundant sequence sets (UniRef100, UniRef90, UniRef50) and a comprehensive set of all known sequences (UniParc). The database is cross-referenced with over 150 other databases and is freely available online.
The article highlights the challenges and progress in manual curation, particularly in expert curation of enzymes, including orphan enzymes. UniProt has curated over 8400 papers and created 3300 new entries in 2013. The focus is on selecting representative publications to provide a complete overview of available information. Automatic annotation systems, such as UniRule and SAAS, use rule-based systems to annotate uncharacterized sequences, leveraging curated data from UniProtKB/Swiss-Prot.
A new user-friendly website has been developed, featuring improved navigation, enhanced search functionality, and structured annotation data. An annotation score system has been introduced to help users identify well-characterized proteins for comparative analysis. UniProt's impact is evident in its citations across various research areas, including biochemistry, biotechnology, and computational biology. The article concludes by emphasizing UniProt's ongoing commitment to organizing and annotating protein information to support scientific discovery.UniProt is a comprehensive resource for protein sequence and annotation, which has seen significant growth, doubling its sequence count to 80 million in the past year. To accommodate this growth, UniProt has expanded its accession number format from 6 to 10 characters and introduced a new proteome identifier to track the provenance of sequences. The database now includes manually curated entries in UniProtKB/Swiss-Prot (half a million sequences) and unreviewed sequences in UniProtKB/TrEMBL (80 million sequences). UniProt also offers non-redundant sequence sets (UniRef100, UniRef90, UniRef50) and a comprehensive set of all known sequences (UniParc). The database is cross-referenced with over 150 other databases and is freely available online.
The article highlights the challenges and progress in manual curation, particularly in expert curation of enzymes, including orphan enzymes. UniProt has curated over 8400 papers and created 3300 new entries in 2013. The focus is on selecting representative publications to provide a complete overview of available information. Automatic annotation systems, such as UniRule and SAAS, use rule-based systems to annotate uncharacterized sequences, leveraging curated data from UniProtKB/Swiss-Prot.
A new user-friendly website has been developed, featuring improved navigation, enhanced search functionality, and structured annotation data. An annotation score system has been introduced to help users identify well-characterized proteins for comparative analysis. UniProt's impact is evident in its citations across various research areas, including biochemistry, biotechnology, and computational biology. The article concludes by emphasizing UniProt's ongoing commitment to organizing and annotating protein information to support scientific discovery.