[slides and audio] CDD%3A a Conserved Domain Database for the functional annotation of proteins

The Conserved Domain Database (CDD) is a resource for protein sequence annotation, providing information on conserved domain footprints and functional sites. CDD includes manually curated domain models that use protein 3D structure to refine domain models and provide insights into sequence/structure/function relationships. These models are organized hierarchically if they describe domain families related by common descent. CDD also imports domain family models from various external sources, making it a partially redundant collection. To simplify annotation, redundant models and homologous family models are clustered into superfamilies. Domain footprints are annotated with the corresponding superfamily designation, with specific annotations indicating high-confidence family membership. Pre-computed domain annotations are available for proteins in the Entrez/Protein dataset, and a novel interface, Batch CD-Search, allows the computation and download of annotations for large sets of protein queries. CDD can be accessed via http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml. CDD is part of NCBI's Entrez system and can be searched using the common Entrez interface. It is cross-linked with other databases such as Entrez Protein, PubMed, and NCBI BioSystems. CDD records are typically encountered through Conserved Domains links from Entrez/Protein sequence records and via protein BLAST and PSI-BLAST searches. The CD-Search tool visualizes live or pre-computed search results using RPS-BLAST, a variation of PSI-BLAST. CDD provides detailed domain annotations, including functional sites, which are recorded with evidence from experimental structures or literature. Functional site annotations are visible in the default display of sequence records in the Entrez/Protein database, and detailed descriptions can be examined on conserved domain summary pages. CDTree and Cn3D software can visualize conserved domain hierarchies, alignments, annotations, functional sites, and corresponding evidence. CDD can be used to compute and retrieve protein sequence annotations for large sets of query sequences. A novel interface, Batch CD-Search, facilitates processing up to 100,000 protein queries at a time. Queries can be supplied as protein GIs, accessions, or raw sequence data. Batch CD-Search compiles results, allowing users to extract various subsets of results. The data can be downloaded in various formats or displayed graphically within a web browser. CDD distributes pre-built search databases and individual PSSMs via the CDD FTP site, enabling the creation of special-purpose RPS-BLAST search databases.The Conserved Domain Database (CDD) is a resource for protein sequence annotation, providing information on conserved domain footprints and functional sites. CDD includes manually curated domain models that use protein 3D structure to refine domain models and provide insights into sequence/structure/function relationships. These models are organized hierarchically if they describe domain families related by common descent. CDD also imports domain family models from various external sources, making it a partially redundant collection. To simplify annotation, redundant models and homologous family models are clustered into superfamilies. Domain footprints are annotated with the corresponding superfamily designation, with specific annotations indicating high-confidence family membership. Pre-computed domain annotations are available for proteins in the Entrez/Protein dataset, and a novel interface, Batch CD-Search, allows the computation and download of annotations for large sets of protein queries. CDD can be accessed via http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml. CDD is part of NCBI's Entrez system and can be searched using the common Entrez interface. It is cross-linked with other databases such as Entrez Protein, PubMed, and NCBI BioSystems. CDD records are typically encountered through Conserved Domains links from Entrez/Protein sequence records and via protein BLAST and PSI-BLAST searches. The CD-Search tool visualizes live or pre-computed search results using RPS-BLAST, a variation of PSI-BLAST. CDD provides detailed domain annotations, including functional sites, which are recorded with evidence from experimental structures or literature. Functional site annotations are visible in the default display of sequence records in the Entrez/Protein database, and detailed descriptions can be examined on conserved domain summary pages. CDTree and Cn3D software can visualize conserved domain hierarchies, alignments, annotations, functional sites, and corresponding evidence. CDD can be used to compute and retrieve protein sequence annotations for large sets of query sequences. A novel interface, Batch CD-Search, facilitates processing up to 100,000 protein queries at a time. Queries can be supplied as protein GIs, accessions, or raw sequence data. Batch CD-Search compiles results, allowing users to extract various subsets of results. The data can be downloaded in various formats or displayed graphically within a web browser. CDD distributes pre-built search databases and individual PSSMs via the CDD FTP site, enabling the creation of special-purpose RPS-BLAST search databases.

CDD: a Conserved Domain Database for the functional annotation of proteins