Published online 29 November 2006 | Aron Marchler-Bauer*, John B. Anderson, Myra K. Derbyshire, Carol DeWeese-Scott, Noreen R. Gonzales, Marc Gwadz, Luning Hao, Siqian He, David I. Hurwitz, John D. Jackson, Zhaoxi Ke, Dmitri Krylov, Christopher J. Lanczycki, Cynthia A. Liebert, Chunlei Liu, Fu Lu, Shennan Lu, Gabriele H. Marchler, Mikhail Mulkokandov, James S. Song, Narmada Thanki, Roxanne A. Yamashita, Jodie J. Yin, Dachuan Zhang and Stephen H. Bryant
The conserved domain database (CDD) is a primary resource for annotating conserved domain footprints on protein sequences in NCBI's Entrez database. CDD provides pre-computed domain annotations and supports interactive domain family analysis through its CD-Search service, which uses BLAST heuristics to search for conserved domain signatures in protein sequences. The CDD collection includes models from Pfam, SMART, COG, and curated NCBI models, organized into hierarchies reflecting common descent. The introduction highlights the importance of domain annotation in characterizing protein function and discusses the limitations of sequence similarity-based annotation methods. The article also introduces CDTree, a helper application that allows users to examine curated domain hierarchies and view their query sequences in the context of phylogenetic trees. CDTree provides detailed information about domain models, sequence trees, and taxonomic scope, aiding in the confident transfer of functional annotations from domain models to protein sequences. The current version of CDD, v2.09, contains 12,422 models, with 2,494 curated at NCBI, covering about 69% of non-identical protein sequences in Entrez. The article concludes with acknowledgments and references.The conserved domain database (CDD) is a primary resource for annotating conserved domain footprints on protein sequences in NCBI's Entrez database. CDD provides pre-computed domain annotations and supports interactive domain family analysis through its CD-Search service, which uses BLAST heuristics to search for conserved domain signatures in protein sequences. The CDD collection includes models from Pfam, SMART, COG, and curated NCBI models, organized into hierarchies reflecting common descent. The introduction highlights the importance of domain annotation in characterizing protein function and discusses the limitations of sequence similarity-based annotation methods. The article also introduces CDTree, a helper application that allows users to examine curated domain hierarchies and view their query sequences in the context of phylogenetic trees. CDTree provides detailed information about domain models, sequence trees, and taxonomic scope, aiding in the confident transfer of functional annotations from domain models to protein sequences. The current version of CDD, v2.09, contains 12,422 models, with 2,494 curated at NCBI, covering about 69% of non-identical protein sequences in Entrez. The article concludes with acknowledgments and references.