dbCAN: a web resource for automated carbohydrate-active enzyme annotation

dbCAN: a web resource for automated carbohydrate-active enzyme annotation

Published online 29 May 2012 | Yanbin Yin1, Xizeng Mao1, Jincai Yang1, Xin Chen2, Fenglou Mao1 and Ying Xu1,2,*
dbCAN is a web resource designed for automated annotation of carbohydrate-active enzymes (CAZymes) in protein datasets, particularly useful for the biotech industry and biofuel sector. The authors developed dbCAN to address the limitations of existing tools like CAZyDB and CAT, which lack clear signature domains for CAZyme families and do not provide automated annotation. dbCAN defines signature domains for each CAZyme family using the CDD database and literature curation, and constructs hidden Markov models (HMMs) to represent these domains. The HMMs are used to perform automated annotation on any given genome or protein dataset. The accuracy of dbCAN's annotation is evaluated using Clostridium thermocellum and Arabidopsis thaliana genomes, showing high sensitivity and positive predictive value. dbCAN also provides pre-computed sequence alignments, HMMs, and phylogenies, and can be applied to metagenome data sets, identifying over one million full-length CAZyme homologous proteins. The resource is freely available and aims to facilitate comprehensive and automated CAZyme annotation at a genome scale.dbCAN is a web resource designed for automated annotation of carbohydrate-active enzymes (CAZymes) in protein datasets, particularly useful for the biotech industry and biofuel sector. The authors developed dbCAN to address the limitations of existing tools like CAZyDB and CAT, which lack clear signature domains for CAZyme families and do not provide automated annotation. dbCAN defines signature domains for each CAZyme family using the CDD database and literature curation, and constructs hidden Markov models (HMMs) to represent these domains. The HMMs are used to perform automated annotation on any given genome or protein dataset. The accuracy of dbCAN's annotation is evaluated using Clostridium thermocellum and Arabidopsis thaliana genomes, showing high sensitivity and positive predictive value. dbCAN also provides pre-computed sequence alignments, HMMs, and phylogenies, and can be applied to metagenome data sets, identifying over one million full-length CAZyme homologous proteins. The resource is freely available and aims to facilitate comprehensive and automated CAZyme annotation at a genome scale.
Reach us at info@study.space