[slides and audio] CDD%3A a conserved domain database for interactive domain family analysis

The Conserved Domain Database (CDD) is part of NCBI's Entrez database system and serves as a primary resource for the annotation of conserved domain footprints on protein sequences. CDD provides a strategy for more accurate assessment of neighbor relationships, similar to phylogenomic inference. CDD contains models imported from Pfam, SMART, and COG, as well as domain models curated at NCBI. These models are organized into hierarchies of domains related by common descent. A novel helper application, CDTree, enables users to examine curated hierarchies. CDD and CDTree together serve as a powerful tool in protein classification, allowing users to analyze protein sequences in the context of domain family hierarchies. CDD provides a search tool using reverse position-specific BLAST (RPS-BLAST), where query sequences are compared to databases of position-specific score matrices (PSSMs). When CDD is scanned with protein query sequences, a region on a query may pick up more than one overlapping footprint from a set of related models. One of those models provides the best score or lowest E-value, but that alone may not be sufficient to indicate that the query sequence is a bona fide member of the corresponding subfamily. CDD also contains imported models, which have not been curated at NCBI, and search results may present a mixture of curated and un-curated models. CDTree is a helper application for the web browser and must be downloaded and installed on the user's computer. It functions as a viewer for curated protein domain hierarchies and retrieves data via the web browser. CDTree is a combined domain hierarchy viewer and editor. It uses a separate program, Cn3D, to view 3D structure and to display and edit multiple alignments of protein structure and sequence. Cn3D is distributed, installed, and configured along with CDTree. CDTree requires a recent version of Cn3D, version v4.2, which is contained in the CDTree installation package. The installation package also contains a stand-alone application, 'fa2cd', which can be used to convert FASTA-formatted multiple sequence alignments into models stored in the 'CD' format. CDD contains a total of 12,422 models, of which 2,494 have been curated at NCBI. Of these curated models, less than 300 are solitary domain models, while the rest are organized into hierarchies. The largest hierarchies contain well over 100 individual models each. CDD is available as part of NCBI's Entrez database and query system. Entries in CDD are cross-linked reciprocally to NCBI taxonomy, citations in PubMed, and to protein sequences in Entrez. Links to protein sequences reflect the results of pre-computed RPS-BLAST searches and are updated on a daily basis.The Conserved Domain Database (CDD) is part of NCBI's Entrez database system and serves as a primary resource for the annotation of conserved domain footprints on protein sequences. CDD provides a strategy for more accurate assessment of neighbor relationships, similar to phylogenomic inference. CDD contains models imported from Pfam, SMART, and COG, as well as domain models curated at NCBI. These models are organized into hierarchies of domains related by common descent. A novel helper application, CDTree, enables users to examine curated hierarchies. CDD and CDTree together serve as a powerful tool in protein classification, allowing users to analyze protein sequences in the context of domain family hierarchies. CDD provides a search tool using reverse position-specific BLAST (RPS-BLAST), where query sequences are compared to databases of position-specific score matrices (PSSMs). When CDD is scanned with protein query sequences, a region on a query may pick up more than one overlapping footprint from a set of related models. One of those models provides the best score or lowest E-value, but that alone may not be sufficient to indicate that the query sequence is a bona fide member of the corresponding subfamily. CDD also contains imported models, which have not been curated at NCBI, and search results may present a mixture of curated and un-curated models. CDTree is a helper application for the web browser and must be downloaded and installed on the user's computer. It functions as a viewer for curated protein domain hierarchies and retrieves data via the web browser. CDTree is a combined domain hierarchy viewer and editor. It uses a separate program, Cn3D, to view 3D structure and to display and edit multiple alignments of protein structure and sequence. Cn3D is distributed, installed, and configured along with CDTree. CDTree requires a recent version of Cn3D, version v4.2, which is contained in the CDTree installation package. The installation package also contains a stand-alone application, 'fa2cd', which can be used to convert FASTA-formatted multiple sequence alignments into models stored in the 'CD' format. CDD contains a total of 12,422 models, of which 2,494 have been curated at NCBI. Of these curated models, less than 300 are solitary domain models, while the rest are organized into hierarchies. The largest hierarchies contain well over 100 individual models each. CDD is available as part of NCBI's Entrez database and query system. Entries in CDD are cross-linked reciprocally to NCBI taxonomy, citations in PubMed, and to protein sequences in Entrez. Links to protein sequences reflect the results of pre-computed RPS-BLAST searches and are updated on a daily basis.

CDD: a conserved domain database for interactive domain family analysis