[slides and audio] The Pfam protein families database%3A towards a more sustainable future

The Pfam protein families database has undergone significant changes to improve its sustainability and efficiency. The database now primarily uses UniProtKB reference proteomes as its source, which provides a more stable and curated set of sequences, reducing the need for manual curation. This change has led to a reduction in the number of sequences displayed on the website, while still maintaining access to many important model organisms. Pfam-B, an automatically generated supplement, has been removed. The current release, Pfam 29.0, includes 16,295 entries and 559 clans. The database has also improved the ability to view relationships between families within a clan using a new tool. The Pfam database has been optimized to handle the exponential growth of sequence databases. Pfam entries are built using representative subsets of sequences, which are aligned to create seed alignments. These seed alignments are used to construct profile hidden Markov models (HMMs) for searching against sequence databases. Pfam entries that are related are grouped into clans, and relationships are identified using sequence information, known protein structures, and HMM-HMM comparisons. The use of reference proteomes has significantly reduced the size of the Pfam sequence database, making it more manageable for biocurators and users. This change has also improved the stability of Pfam seed alignments, as reference proteomes are generally more stable and come from higher quality complete genomes. The database has also introduced new entry types, such as 'Disordered' and 'Coiled-coil', to better represent certain protein regions. The Pfam database has also improved its ability to handle overlaps between entries, allowing for a more accurate representation of relationships between Pfam entries. The database has also introduced a new interactive JavaScript graph viewer to represent clan relationships, making it easier for users to understand the relationships between Pfam entries. The Pfam database has also improved its ability to provide annotations for UniProtKB sequences, allowing users to access Pfam data for all of UniProtKB. The database has also improved its ability to map Pfam entries to known structures using the SIFTS mapping. Overall, the Pfam database has been significantly improved to enhance its sustainability, efficiency, and usability. The changes have allowed for more frequent releases and have improved the accuracy and stability of Pfam entries. The database continues to evolve to meet the needs of its users and to provide accurate and reliable information about protein families.The Pfam protein families database has undergone significant changes to improve its sustainability and efficiency. The database now primarily uses UniProtKB reference proteomes as its source, which provides a more stable and curated set of sequences, reducing the need for manual curation. This change has led to a reduction in the number of sequences displayed on the website, while still maintaining access to many important model organisms. Pfam-B, an automatically generated supplement, has been removed. The current release, Pfam 29.0, includes 16,295 entries and 559 clans. The database has also improved the ability to view relationships between families within a clan using a new tool. The Pfam database has been optimized to handle the exponential growth of sequence databases. Pfam entries are built using representative subsets of sequences, which are aligned to create seed alignments. These seed alignments are used to construct profile hidden Markov models (HMMs) for searching against sequence databases. Pfam entries that are related are grouped into clans, and relationships are identified using sequence information, known protein structures, and HMM-HMM comparisons. The use of reference proteomes has significantly reduced the size of the Pfam sequence database, making it more manageable for biocurators and users. This change has also improved the stability of Pfam seed alignments, as reference proteomes are generally more stable and come from higher quality complete genomes. The database has also introduced new entry types, such as 'Disordered' and 'Coiled-coil', to better represent certain protein regions. The Pfam database has also improved its ability to handle overlaps between entries, allowing for a more accurate representation of relationships between Pfam entries. The database has also introduced a new interactive JavaScript graph viewer to represent clan relationships, making it easier for users to understand the relationships between Pfam entries. The Pfam database has also improved its ability to provide annotations for UniProtKB sequences, allowing users to access Pfam data for all of UniProtKB. The database has also improved its ability to map Pfam entries to known structures using the SIFTS mapping. Overall, the Pfam database has been significantly improved to enhance its sustainability, efficiency, and usability. The changes have allowed for more frequent releases and have improved the accuracy and stability of Pfam entries. The database continues to evolve to meet the needs of its users and to provide accurate and reliable information about protein families.

The Pfam protein families database: towards a more sustainable future

2016 | Robert D. Finn, Penelope Coggill, Ruth Y. Eberhardt, Sean R. Eddy, Jaina Mistry, Alex L. Mitchell, Simon C. Potter, Marco Punta, Matloob Qureshi, Amaia Sangrador-Vegas, Gustavo A. Salazar, John Tate and Alex Bateman