CDD: conserved domains and protein three-dimensional structure

CDD: conserved domains and protein three-dimensional structure

2013, Vol. 41, Database issue | Aron Marchler-Bauer*, Chanjuan Zheng, Farideh Chitsaz, Myra K. Derbyshire, Lewis Y. Geer, Renata C. Geer, Noreen R. Gonzales, Marc Gwadz, David I. Hurwitz, Christopher J. Lanczycki, Fu Lu, Shennan Lu, Gabriele H. Marchler, James S. Song, Narmada Thanki, Roxanne A. Yamashita, Dachuan Zhang and Stephen H. Bryant
The Conserved Domain Database (CDD) is a component of NCBI's Entrez query and retrieval system, providing annotation of protein sequences with the location of conserved domain footprints and functional sites. CDD offers pre-computed annotations and interactive search services that can handle single or batch submissions of protein or nucleotide queries using RPS-BLAST. It incorporates various protein domain and full-length protein model collections and maintains an active curation effort to provide fine-grained classifications for major and well-characterized protein domain families, supported by 3D structures and published literature. The majority of protein 3D structures are represented by models tracked by CDD, and curators are actively characterizing novel families emerging from protein structure determination efforts. CDD's coverage includes about 76% of protein sequences in Entrez, with a higher fraction (94%) for structure-linked proteins. The database contains 43,212 alignment models, 8,566 of which are curated by NCBI. CDD clusters single-domain models into superfamilies and assigns them unique accessions starting with 'cl'. As of version 3.08, 5007 out of 12,307 single-domain superfamilies are linked to one or more 3D structures, indicating that 3D structures are known for at least 41% of protein domain superfamilies. CDD uses a mechanism to assign high confidence to matches between protein query sequences and domain models, ensuring that specific hits meet the highest-ranked match criteria and cross a model-specific score threshold. The CD-Search service now accepts nucleotide sequences as queries and translates them into polypeptide sequences for searching against the model database. Functional site annotations are also available, with 18,263 sites recorded on 7,382 models, and the specificity of site mapping has been improved by adding sequence motifs/patterns. CDD provides access to its annotation data through the CD-Search service and the Conserved Domain Architecture Retrieval Tool (CDART), which allows users to search for proteins with similar domain architectures. The article acknowledges contributions from various individuals and organizations and is funded by the Intramural Research Program of the National Library of Medicine at the National Institutes of Health/DHHS.The Conserved Domain Database (CDD) is a component of NCBI's Entrez query and retrieval system, providing annotation of protein sequences with the location of conserved domain footprints and functional sites. CDD offers pre-computed annotations and interactive search services that can handle single or batch submissions of protein or nucleotide queries using RPS-BLAST. It incorporates various protein domain and full-length protein model collections and maintains an active curation effort to provide fine-grained classifications for major and well-characterized protein domain families, supported by 3D structures and published literature. The majority of protein 3D structures are represented by models tracked by CDD, and curators are actively characterizing novel families emerging from protein structure determination efforts. CDD's coverage includes about 76% of protein sequences in Entrez, with a higher fraction (94%) for structure-linked proteins. The database contains 43,212 alignment models, 8,566 of which are curated by NCBI. CDD clusters single-domain models into superfamilies and assigns them unique accessions starting with 'cl'. As of version 3.08, 5007 out of 12,307 single-domain superfamilies are linked to one or more 3D structures, indicating that 3D structures are known for at least 41% of protein domain superfamilies. CDD uses a mechanism to assign high confidence to matches between protein query sequences and domain models, ensuring that specific hits meet the highest-ranked match criteria and cross a model-specific score threshold. The CD-Search service now accepts nucleotide sequences as queries and translates them into polypeptide sequences for searching against the model database. Functional site annotations are also available, with 18,263 sites recorded on 7,382 models, and the specificity of site mapping has been improved by adding sequence motifs/patterns. CDD provides access to its annotation data through the CD-Search service and the Conserved Domain Architecture Retrieval Tool (CDART), which allows users to search for proteins with similar domain architectures. The article acknowledges contributions from various individuals and organizations and is funded by the Intramural Research Program of the National Library of Medicine at the National Institutes of Health/DHHS.
Reach us at info@study.space