2015 | Baris E. Suzek, Yuqi Wang, Hongzhan Huang, Peter B. McGarvey, Cathy H. Wu and the UniProt Consortium
The article discusses the UniRef databases, which are comprehensive and scalable alternatives to native sequence databases for improving sequence similarity searches. UniRef databases provide clustered sets of sequences from the UniProt Knowledgebase and selected UniParc records, reducing redundancy while preserving information on source and quality annotation. The authors analyze the intra-cluster molecular function consistency using Gene Ontology (GO) terms, finding that over 97% of UniRef90 and UniRef50 clusters bring together proteins with identical or common molecular functions. They also compare the performance of UniRef50-based and UniProtKB-based BLASTP searches, showing that UniRef50-based searches are faster, more concise, and more sensitive in detecting remote similarities. The results support the use of UniRef databases for functional annotation and highlight their reliability and efficiency in sequence similarity searches.The article discusses the UniRef databases, which are comprehensive and scalable alternatives to native sequence databases for improving sequence similarity searches. UniRef databases provide clustered sets of sequences from the UniProt Knowledgebase and selected UniParc records, reducing redundancy while preserving information on source and quality annotation. The authors analyze the intra-cluster molecular function consistency using Gene Ontology (GO) terms, finding that over 97% of UniRef90 and UniRef50 clusters bring together proteins with identical or common molecular functions. They also compare the performance of UniRef50-based and UniProtKB-based BLASTP searches, showing that UniRef50-based searches are faster, more concise, and more sensitive in detecting remote similarities. The results support the use of UniRef databases for functional annotation and highlight their reliability and efficiency in sequence similarity searches.