The Pfam protein families database in 2019

The Pfam protein families database in 2019

2019 | Sara El-Gebali, Jaina Mistry, Alex Bateman, Sean R. Eddy, Aurélien Luciani, Simon C. Potter, Matloob Qureshi, Lorna J. Richardson, Gustavo A. Salazar, Alfredo Smart, Erik L.L. Sonnhammer, Layla Hirsh, Lisanna Paladin, Damiano Piovesan, Silvio C.E. Tosatto and Robert D. Finn
The Pfam protein families database, now in version 32.0, has significantly expanded its collection of protein family entries, reaching a total of 17,929. This update reflects improvements in family definitions, domain boundaries, and functional annotations, as well as enhanced collaboration with other resources like RepeatsDB and ECOD. Pfam aims to cover as much of protein sequences as possible with the fewest number of models, ensuring that no two entries overlap. Pfam entries are categorized into six types: family, domain, motif, repeat, coiled coil, or disordered. The majority of entries are either family or domain types, accounting for over 97% of all entries. The Pfam database has been updated to include more detailed annotations and improved coverage of protein sequences. The sequence and residue coverage of UniProtKB has remained relatively stable, while the coverage of reference proteomes in pfamseq has increased. Pfam 32.0 now covers 74.5% of sequences and 50.1% of residues in reference proteomes. The database has also been enhanced by integrating information from the Sequence Ontology (SO), allowing for better credit attribution to authors of Pfam entries. Additionally, Pfam has implemented a system to link authors' ORCID identifiers to their Pfam contributions, enabling better recognition of their work. The Pfam database has also been improved by incorporating data from the Evolutionary Classification of Protein Domains (ECOD), which has led to the creation of 825 new Pfam entries. This integration has helped refine the definitions of existing families and improve the consistency of Pfam domains with known structures. Pfam clans, which group evolutionarily related families, have also been expanded, with 74 new clans added since Pfam 29.0. These clans help organize families into groups based on their evolutionary relationships. The Pfam database has also been improved by refining the definitions of repeat families, particularly through collaboration with RepeatsDB. This has led to the creation of new Pfam entries and the refinement of existing ones. Additionally, the database has been updated to include more detailed annotations for domains of unknown function (DUFs), improving their functional characterization and linking them to other databases. Overall, Pfam 32.0 represents a significant improvement in the coverage, accuracy, and usability of the Pfam database, with a focus on enhancing the quality of annotations, improving the integration of data from other resources, and ensuring better recognition of contributors to the database.The Pfam protein families database, now in version 32.0, has significantly expanded its collection of protein family entries, reaching a total of 17,929. This update reflects improvements in family definitions, domain boundaries, and functional annotations, as well as enhanced collaboration with other resources like RepeatsDB and ECOD. Pfam aims to cover as much of protein sequences as possible with the fewest number of models, ensuring that no two entries overlap. Pfam entries are categorized into six types: family, domain, motif, repeat, coiled coil, or disordered. The majority of entries are either family or domain types, accounting for over 97% of all entries. The Pfam database has been updated to include more detailed annotations and improved coverage of protein sequences. The sequence and residue coverage of UniProtKB has remained relatively stable, while the coverage of reference proteomes in pfamseq has increased. Pfam 32.0 now covers 74.5% of sequences and 50.1% of residues in reference proteomes. The database has also been enhanced by integrating information from the Sequence Ontology (SO), allowing for better credit attribution to authors of Pfam entries. Additionally, Pfam has implemented a system to link authors' ORCID identifiers to their Pfam contributions, enabling better recognition of their work. The Pfam database has also been improved by incorporating data from the Evolutionary Classification of Protein Domains (ECOD), which has led to the creation of 825 new Pfam entries. This integration has helped refine the definitions of existing families and improve the consistency of Pfam domains with known structures. Pfam clans, which group evolutionarily related families, have also been expanded, with 74 new clans added since Pfam 29.0. These clans help organize families into groups based on their evolutionary relationships. The Pfam database has also been improved by refining the definitions of repeat families, particularly through collaboration with RepeatsDB. This has led to the creation of new Pfam entries and the refinement of existing ones. Additionally, the database has been updated to include more detailed annotations for domains of unknown function (DUFs), improving their functional characterization and linking them to other databases. Overall, Pfam 32.0 represents a significant improvement in the coverage, accuracy, and usability of the Pfam database, with a focus on enhancing the quality of annotations, improving the integration of data from other resources, and ensuring better recognition of contributors to the database.
Reach us at info@study.space