Understanding CDD%3A conserved domains and protein three-dimensional structure

The Conserved Domain Database (CDD) is part of the NCBI Entrez system and provides annotation of protein sequences with conserved domain footprints and functional sites. CDD uses RPS-BLAST to rapidly identify putative matches and offers pre-computed annotations via Entrez. It incorporates domain and full-length protein model collections and maintains an active curation effort, using 3D protein structures and literature to classify domain families. CDD curators characterize novel families from protein structure determination efforts. CDD's domain models are based on sequence and structure analyses, and 3D structures are used to define MSA core blocks. CDD's hierarchical classifications are cross-validated against published literature and computational classifications. CDD's current version (v3.08) contains 43,212 alignment models, with 8,566 curated by NCBI. It includes models from Pfam, SMART, COG, TIGRFAMs, and the NCBI Protein Clusters database. CDD covers about 76% of Entrez protein sequences, with 94 domain superfamilies entirely curated by NCBI. Over 5000 of these superfamilies are linked to 3D structures, indicating 41% of domain superfamilies have known 3D structures. CDD's CD-Search service allows querying with nucleotide sequences, translating them into six reading frames and searching against the model database. CDD provides functional site annotations, mapping sites to protein sequences via CD-Search. These annotations are pre-computed for Entrez proteins and available via GenPept views. Functional site mapping has been improved by adding sequence motifs, enhancing accuracy. CDD's annotation data can be accessed via CD-Search or BATCH CD-Search for larger datasets. The CDART service enables analysis of domain architectures, with improved performance and user interface. CDD's data is supported by various databases and resources, including Pfam, SMART, COGs, TIGRFAMs, and the NCBI Protein Clusters database. CDD is funded by the National Library of Medicine's Intramural Research Program.The Conserved Domain Database (CDD) is part of the NCBI Entrez system and provides annotation of protein sequences with conserved domain footprints and functional sites. CDD uses RPS-BLAST to rapidly identify putative matches and offers pre-computed annotations via Entrez. It incorporates domain and full-length protein model collections and maintains an active curation effort, using 3D protein structures and literature to classify domain families. CDD curators characterize novel families from protein structure determination efforts. CDD's domain models are based on sequence and structure analyses, and 3D structures are used to define MSA core blocks. CDD's hierarchical classifications are cross-validated against published literature and computational classifications. CDD's current version (v3.08) contains 43,212 alignment models, with 8,566 curated by NCBI. It includes models from Pfam, SMART, COG, TIGRFAMs, and the NCBI Protein Clusters database. CDD covers about 76% of Entrez protein sequences, with 94 domain superfamilies entirely curated by NCBI. Over 5000 of these superfamilies are linked to 3D structures, indicating 41% of domain superfamilies have known 3D structures. CDD's CD-Search service allows querying with nucleotide sequences, translating them into six reading frames and searching against the model database. CDD provides functional site annotations, mapping sites to protein sequences via CD-Search. These annotations are pre-computed for Entrez proteins and available via GenPept views. Functional site mapping has been improved by adding sequence motifs, enhancing accuracy. CDD's annotation data can be accessed via CD-Search or BATCH CD-Search for larger datasets. The CDART service enables analysis of domain architectures, with improved performance and user interface. CDD's data is supported by various databases and resources, including Pfam, SMART, COGs, TIGRFAMs, and the NCBI Protein Clusters database. CDD is funded by the National Library of Medicine's Intramural Research Program.

CDD: conserved domains and protein three-dimensional structure