2014 | Robert D. Finn, Alex Bateman, Jody Clements, Penelope Coggill, Ruth Y. Eberhard, Sean R. Eddy, Andreas Heger, Kirstie Hetherington, Liisa Holm, Jaina Mistry, Erik L. L. Sonnhammer, John Tate and Marco Punta
Pfam is a widely used database of protein families, containing 14,831 manually curated entries in version 27.0. Since its last update in 2011, Pfam has added 1,182 new families and maintained nearly 80% sequence coverage of the UniProt Knowledgebase (UniProtKB), despite a 50% increase in the size of the underlying sequence database. Pfam provides comprehensive features beyond basic family data, including family alignments based on four different proteome sequence datasets and an interactive DNA search interface. The database also discusses the mapping between Pfam and known 3D structures.
Pfam is a database of curated protein families, each defined by two alignments and a profile hidden Markov model (HMM). Profile HMMs are probabilistic models used for statistical inference of homology. Pfam-A entries are curated families, while Pfam-B entries are automatically generated from sequence clusters not covered by Pfam-A. Pfam data are available in various formats, including flatfiles and relational table dumps, and can be downloaded from the FTP site. The Pfam website provides different ways to access the database content, including graphical representations and interactive access.
In 2012, Pfam introduced Wikipedia as a platform for community-based functional annotation. Since release 26.0, Pfam has linked as many Pfam-A families as possible to Wikipedia articles. The number of families linking to Wikipedia articles increased from 4,942 in 26.0 to 5,663 in 27.0. Pfam has also removed some features that were no longer useful, enhanced others, and developed new ones to meet the changing demands of computational biology.
In 2013, Pfam introduced four additional alignments based on representative proteomes (RPs), which contain decreasing amounts of sequence redundancy. These alignments allow for more manageable and useful samples of sequence diversity within a family. Pfam also introduced a new interactive DNA search interface, which allows users to search for Pfam-A families in DNA sequences. Pfam has also improved the accessibility of proteome data, providing a list of all Pfam-A matches per proteome on its FTP site.
Pfam has also improved the representation of intrinsic sequence disorder, incorporating IUPred predictions for all Pfamseq sequences. These data are stored in the MySQL database and displayed graphically on the website. Pfam has also improved the mapping of Pfam-A entries to protein structures, using the SIFTS resource to unify structural and sequence information.
Pfam continues to grow and evolve, with efforts concentrated on adding new families and improving existing ones, while also trying to make the core family data as accessible as possible. Pfam is committed to producing more frequent releases, a process which may result in further changes to the database and website. Funding for Pfam comes from various sources,Pfam is a widely used database of protein families, containing 14,831 manually curated entries in version 27.0. Since its last update in 2011, Pfam has added 1,182 new families and maintained nearly 80% sequence coverage of the UniProt Knowledgebase (UniProtKB), despite a 50% increase in the size of the underlying sequence database. Pfam provides comprehensive features beyond basic family data, including family alignments based on four different proteome sequence datasets and an interactive DNA search interface. The database also discusses the mapping between Pfam and known 3D structures.
Pfam is a database of curated protein families, each defined by two alignments and a profile hidden Markov model (HMM). Profile HMMs are probabilistic models used for statistical inference of homology. Pfam-A entries are curated families, while Pfam-B entries are automatically generated from sequence clusters not covered by Pfam-A. Pfam data are available in various formats, including flatfiles and relational table dumps, and can be downloaded from the FTP site. The Pfam website provides different ways to access the database content, including graphical representations and interactive access.
In 2012, Pfam introduced Wikipedia as a platform for community-based functional annotation. Since release 26.0, Pfam has linked as many Pfam-A families as possible to Wikipedia articles. The number of families linking to Wikipedia articles increased from 4,942 in 26.0 to 5,663 in 27.0. Pfam has also removed some features that were no longer useful, enhanced others, and developed new ones to meet the changing demands of computational biology.
In 2013, Pfam introduced four additional alignments based on representative proteomes (RPs), which contain decreasing amounts of sequence redundancy. These alignments allow for more manageable and useful samples of sequence diversity within a family. Pfam also introduced a new interactive DNA search interface, which allows users to search for Pfam-A families in DNA sequences. Pfam has also improved the accessibility of proteome data, providing a list of all Pfam-A matches per proteome on its FTP site.
Pfam has also improved the representation of intrinsic sequence disorder, incorporating IUPred predictions for all Pfamseq sequences. These data are stored in the MySQL database and displayed graphically on the website. Pfam has also improved the mapping of Pfam-A entries to protein structures, using the SIFTS resource to unify structural and sequence information.
Pfam continues to grow and evolve, with efforts concentrated on adding new families and improving existing ones, while also trying to make the core family data as accessible as possible. Pfam is committed to producing more frequent releases, a process which may result in further changes to the database and website. Funding for Pfam comes from various sources,