Understanding CDD%2FSPARCLE%3A functional classification of proteins via subfamily domain architectures

The Conserved Domain Database (CDD) is a resource for annotating biomolecular sequences with the location of evolutionarily conserved protein domain footprints and inferred functional sites. CDD maintains a pre-computed archive of domain annotations for proteins tracked by NCBI's Entrez database and offers live search services. CDD curators supplement a comprehensive collection of protein domain and family models with in-house curated domain families, organized into hierarchical classifications. CDD supports comparative analyses of protein families via conserved domain architectures and uses SPARCLE, a tool for functional characterization of subfamily architectures. CDD is part of NCBI's Entrez system and is cross-linked with other databases. CDD annotates over 250 million sequences in Entrez/protein, with 96% of structure-derived sequences over 30 residues long. CDD curators annotate functional sites on NCBI-curated models, such as active and binding sites. Over 29,991 site annotations have been created on 10,605 out of 12,805 NCBI-curated domain models. CDD has integrated over 1,000 domain signatures from InterPro. CDD curators have assigned names and functional labels to more than 6,500 subfamily domain architectures (SDAs). The SPARCLE tool allows associating domain architectures with functional descriptions. Functional labels are shown on results pages for user queries. SDAs vary in coverage and functional diversity. The resolution of protein classification depends on the availability of specific reagents in CDD. CDD focuses on bacterial genome architectures and is investigating methods for automating name and label assignments. Automatically assigned names and labels will be flagged as unvalidated. CDD is funded by the National Library of Medicine's Intramural Research Program. No conflicts of interest are declared.The Conserved Domain Database (CDD) is a resource for annotating biomolecular sequences with the location of evolutionarily conserved protein domain footprints and inferred functional sites. CDD maintains a pre-computed archive of domain annotations for proteins tracked by NCBI's Entrez database and offers live search services. CDD curators supplement a comprehensive collection of protein domain and family models with in-house curated domain families, organized into hierarchical classifications. CDD supports comparative analyses of protein families via conserved domain architectures and uses SPARCLE, a tool for functional characterization of subfamily architectures. CDD is part of NCBI's Entrez system and is cross-linked with other databases. CDD annotates over 250 million sequences in Entrez/protein, with 96% of structure-derived sequences over 30 residues long. CDD curators annotate functional sites on NCBI-curated models, such as active and binding sites. Over 29,991 site annotations have been created on 10,605 out of 12,805 NCBI-curated domain models. CDD has integrated over 1,000 domain signatures from InterPro. CDD curators have assigned names and functional labels to more than 6,500 subfamily domain architectures (SDAs). The SPARCLE tool allows associating domain architectures with functional descriptions. Functional labels are shown on results pages for user queries. SDAs vary in coverage and functional diversity. The resolution of protein classification depends on the availability of specific reagents in CDD. CDD focuses on bacterial genome architectures and is investigating methods for automating name and label assignments. Automatically assigned names and labels will be flagged as unvalidated. CDD is funded by the National Library of Medicine's Intramural Research Program. No conflicts of interest are declared.

CDD/SPARCLE: functional classification of proteins via subfamily domain architectures