Received September 14, 2004; Revised and Accepted October 5, 2004 | Amos Bairoch, Rolf Apweiler1,* , Cathy H. Wu2, Winona C. Barker3, Brigitte Boeckmann, Serenella Ferro, Elisabeth Gasteiger, Hongzan Huang2, Rodrigo Lopez1, Michele Magrane1, Maria J. Martin1, Darren A. Natale2, Claire O'Donovan1, Nicole Redaschi and Lai-Su L. Yeh3
The Universal Protein Resource (UniProt) is a centralized, authoritative resource for protein sequences and functional information, formed by combining the Swiss-Prot, TrEMBL, and PIR protein databases. UniProt consists of three layers: the UniProt Archive (UniParc), the UniProt Knowledgebase (UniProt), and the UniProt Reference (UniRef) databases. The UniProt Knowledgebase, a comprehensive and richly annotated database, includes manually curated entries from Swiss-Prot and automatically annotated entries from TrEMBL. In 2004, significant improvements were made, including the introduction of a new comment line topic for toxic dose information, enhanced keyword documentation, and improved post-translational modification annotation. UniProt also introduced a documentation file for strains and synonyms, and improved integration with structural databases through residue-level mapping from the Protein Data Bank. The UniRef databases provide non-redundant data collections based on the UniProt Knowledgebase and UniParc, facilitating sequence searches and coverage of sequence space. UniProt is accessible online and via FTP, with new releases published every two weeks.The Universal Protein Resource (UniProt) is a centralized, authoritative resource for protein sequences and functional information, formed by combining the Swiss-Prot, TrEMBL, and PIR protein databases. UniProt consists of three layers: the UniProt Archive (UniParc), the UniProt Knowledgebase (UniProt), and the UniProt Reference (UniRef) databases. The UniProt Knowledgebase, a comprehensive and richly annotated database, includes manually curated entries from Swiss-Prot and automatically annotated entries from TrEMBL. In 2004, significant improvements were made, including the introduction of a new comment line topic for toxic dose information, enhanced keyword documentation, and improved post-translational modification annotation. UniProt also introduced a documentation file for strains and synonyms, and improved integration with structural databases through residue-level mapping from the Protein Data Bank. The UniRef databases provide non-redundant data collections based on the UniProt Knowledgebase and UniParc, facilitating sequence searches and coverage of sequence space. UniProt is accessible online and via FTP, with new releases published every two weeks.