Pfam: clans, web tools and services

Pfam: clans, web tools and services

2006 | Robert D. Finn*, Jaina Mistry, Benjamin Schuster-Böckler, Sam Griffiths-Jones, Volker Hollich, Timo Lassmann¹, Simon Moxon, Mhairi Marshall, Ajay Khanna², Richard Durbin, Sean R. Eddy², Erik L. L. Sonnhammer¹ and Alex Bateman
Pfam is a database of protein families containing 7973 entries (release 18.0). A recent development allows grouping related families into clans. Pfam clans are described, along with new web pages and improved web tools and services. Pfam is available on the web in the UK, USA, France, and Sweden. Pfam is a comprehensive database of protein families, containing 7973 families in the current release (18.0). Each family is manually curated and represented by two multiple sequence alignments, two profile-HMMs, and an annotation file. Pfam families are periodically updated, with each family modified on average four times since its creation. The data and additional features are accessible via four websites. Several new features have been added to Pfam in the past two years. The main focus of this paper is to describe a change in Pfam philosophy that allows grouping protein families into a hierarchical classification of clans. New web tools and Pfam web services are also described. An additional feature, iPfam, a sister database containing details of Pfam domain-domain interactions, has been described in a recent publication. Pfam has increased by 1783 families since release 10.0. Despite the near doubling of sequences in the underlying sequence database over the past two years, the fraction of sequences in UniProt that match a Pfam family remains at 75%. One of the main uses of Pfam is genome annotation, thus an important measure is the coverage of the nonredundant set of proteins encoded by a genome, called proteome coverage. Table 1 shows the increase in Pfam coverage of a selected set of proteomes since Pfam began 9 years ago. Pfam clans are a hierarchical classification of related protein families. Pfam clans help improve the annotation of families. For example, knowing the 3D structure of a domain is an essential part of understanding the biology of that domain. Pfam clans are helping to identify, previously undetected, structural homologues. Currently, 66% of all families in clans contain at least one sequence with a known 3D structure. A further 418 families (30%) where a structural homologue is not found in the family are in a clan where at least one family contains a known 3D structure. Pfam clans provide a hierarchical view of a diverse range of protein families. Pfam clans relate to other classifications of protein families, such as SCOP and SUPFAM. Pfam clans are not confined to those families with a known 3D structure. Some Pfam clans contain groups of related families where none of the members have a determined 3D structure. There are three different ways of accessing the clan information. First, there is an additional release flatfile, Pfam-C, which contains all of the clan information. Second, all of thePfam is a database of protein families containing 7973 entries (release 18.0). A recent development allows grouping related families into clans. Pfam clans are described, along with new web pages and improved web tools and services. Pfam is available on the web in the UK, USA, France, and Sweden. Pfam is a comprehensive database of protein families, containing 7973 families in the current release (18.0). Each family is manually curated and represented by two multiple sequence alignments, two profile-HMMs, and an annotation file. Pfam families are periodically updated, with each family modified on average four times since its creation. The data and additional features are accessible via four websites. Several new features have been added to Pfam in the past two years. The main focus of this paper is to describe a change in Pfam philosophy that allows grouping protein families into a hierarchical classification of clans. New web tools and Pfam web services are also described. An additional feature, iPfam, a sister database containing details of Pfam domain-domain interactions, has been described in a recent publication. Pfam has increased by 1783 families since release 10.0. Despite the near doubling of sequences in the underlying sequence database over the past two years, the fraction of sequences in UniProt that match a Pfam family remains at 75%. One of the main uses of Pfam is genome annotation, thus an important measure is the coverage of the nonredundant set of proteins encoded by a genome, called proteome coverage. Table 1 shows the increase in Pfam coverage of a selected set of proteomes since Pfam began 9 years ago. Pfam clans are a hierarchical classification of related protein families. Pfam clans help improve the annotation of families. For example, knowing the 3D structure of a domain is an essential part of understanding the biology of that domain. Pfam clans are helping to identify, previously undetected, structural homologues. Currently, 66% of all families in clans contain at least one sequence with a known 3D structure. A further 418 families (30%) where a structural homologue is not found in the family are in a clan where at least one family contains a known 3D structure. Pfam clans provide a hierarchical view of a diverse range of protein families. Pfam clans relate to other classifications of protein families, such as SCOP and SUPFAM. Pfam clans are not confined to those families with a known 3D structure. Some Pfam clans contain groups of related families where none of the members have a determined 3D structure. There are three different ways of accessing the clan information. First, there is an additional release flatfile, Pfam-C, which contains all of the clan information. Second, all of the
Reach us at info@study.space