20 years of the SMART protein domain annotation resource

20 years of the SMART protein domain annotation resource

2018 | Ivica Letunic and Peer Bork
SMART is a web resource for identifying and annotating protein domains and analyzing domain architectures. In its 20th year, SMART has been updated with new features and improved technologies. Version 8 includes over 1300 manually curated models for protein domains, with more than 100 new models added since the last update. The underlying protein databases have been synchronized with UniProt, Ensembl, and STRING, doubling the number of annotated domains and features to over 200 million. SMART's vector-based display engine has been extended to all protein schematics and rewritten to use the latest web technologies. The internal full-text search engine has been redesigned for faster search speeds. The database integrates manually curated hidden Markov models with a powerful web-based interface for analysis and visualization. After 20 years, it remains a popular and widely used tool with nearly 50,000 distinct users per month. SMART has expanded its domain coverage, adding over 100 new domains, bringing the total to 1302. The database includes significant manual work in creating high-quality multiple sequence alignments and selecting domain-specific cutoff values. Other databases like Pfam have also annotated many of these domains, but SMART's manual annotation pipeline leads to different protein annotations, aiding hypothesis generation. The main protein database combines complete UniProt with all stable Ensembl proteomes, containing over 50 million proteins from 460,000 species. A per-species clustering method has been used to minimize redundancy, creating 2.9 million multi-protein clusters. SMART also offers a 'genomic' analysis mode with proteins from completely sequenced genomes. A new vector-based display engine for protein schematics has been implemented, allowing zooming and exporting to high-resolution images. The interactive viewer allows users to select and highlight protein features, and to submit regions for further BLAST analysis. Detailed information about protein features is displayed in floating popup dialogs, improving user experience. Protein orthology data are parsed from the eggNOG database, covering over 7.5 million proteins from more than 3500 species. Post-translational modification data are synchronized with the PTMcode database. Protein interaction data have been updated to version 10.5 of the STRING database. Domain architecture analysis functions allow users to access proteins with specific domain combinations. These data can be exported into FASTA files or phylogenetic trees. The phylogenetic tree export has been rewritten to be compatible with the Interactive Tree of Life (iTOL) version 3. The backend of SMART is a relational database management system powered by PostgreSQL, storing annotation data, taxonomy information, and pre-calculated protein analyses. The full-text search engine has been updated for faster searches. SMART continues to expand its coverage, keep up with web standards, and implement new features to enhance user experience.SMART is a web resource for identifying and annotating protein domains and analyzing domain architectures. In its 20th year, SMART has been updated with new features and improved technologies. Version 8 includes over 1300 manually curated models for protein domains, with more than 100 new models added since the last update. The underlying protein databases have been synchronized with UniProt, Ensembl, and STRING, doubling the number of annotated domains and features to over 200 million. SMART's vector-based display engine has been extended to all protein schematics and rewritten to use the latest web technologies. The internal full-text search engine has been redesigned for faster search speeds. The database integrates manually curated hidden Markov models with a powerful web-based interface for analysis and visualization. After 20 years, it remains a popular and widely used tool with nearly 50,000 distinct users per month. SMART has expanded its domain coverage, adding over 100 new domains, bringing the total to 1302. The database includes significant manual work in creating high-quality multiple sequence alignments and selecting domain-specific cutoff values. Other databases like Pfam have also annotated many of these domains, but SMART's manual annotation pipeline leads to different protein annotations, aiding hypothesis generation. The main protein database combines complete UniProt with all stable Ensembl proteomes, containing over 50 million proteins from 460,000 species. A per-species clustering method has been used to minimize redundancy, creating 2.9 million multi-protein clusters. SMART also offers a 'genomic' analysis mode with proteins from completely sequenced genomes. A new vector-based display engine for protein schematics has been implemented, allowing zooming and exporting to high-resolution images. The interactive viewer allows users to select and highlight protein features, and to submit regions for further BLAST analysis. Detailed information about protein features is displayed in floating popup dialogs, improving user experience. Protein orthology data are parsed from the eggNOG database, covering over 7.5 million proteins from more than 3500 species. Post-translational modification data are synchronized with the PTMcode database. Protein interaction data have been updated to version 10.5 of the STRING database. Domain architecture analysis functions allow users to access proteins with specific domain combinations. These data can be exported into FASTA files or phylogenetic trees. The phylogenetic tree export has been rewritten to be compatible with the Interactive Tree of Life (iTOL) version 3. The backend of SMART is a relational database management system powered by PostgreSQL, storing annotation data, taxonomy information, and pre-calculated protein analyses. The full-text search engine has been updated for faster searches. SMART continues to expand its coverage, keep up with web standards, and implement new features to enhance user experience.
Reach us at info@study.space
Understanding 20 years of the SMART protein domain annotation resource