2016 | Minoru Kanehisa¹, Yoko Sato², Masayuki Kawashima², Miho Furumichi¹ and Mao Tanabe¹
KEGG is an integrated database resource for biological interpretation of genome sequences and other high-throughput data. It provides molecular functions of genes and proteins associated with ortholog groups in the KEGG Orthology (KO) database. KEGG pathway maps, BRITE hierarchies, and KEGG modules represent high-level functions of the cell and organism. Over 4000 complete genomes are annotated with KOs in the KEGG GENES database, serving as a reference for KO assignment and pathway reconstruction. Improvements include re-examining KO records, adding viruses, plasmids, and functionally characterized proteins, and introducing new annotation servers, BlastKOALA and GhostKOALA, using non-redundant pangenome data. KEGG also provides data sets for antimicrobial resistance and drug interaction networks.
KEGG is an integrated database with 16 main databases, including systems, genomic, chemical, and health information. The PATHWAY, BRITE, and MODULE databases represent high-level functions. The GENOME and GENES databases contain complete genomes and gene catalogs. The KO database links genes to high-level functions. The COMPOUND, GLYCAN, REACTION, RPAIR, RCLASS, and ENZYME databases contain chemical substances and reactions. The DISEASE, DRUG, DGROUP, and ENVIRON databases provide disease and drug information. KEGG MEDICUS integrates these databases with drug labels.
Each KO represents a sequence similarity group. KO grouping is updated based on experimental evidence. The GENES database now includes viruses, plasmids, and addendum categories. The addendum category contains manually created protein sequence entries. KEGG organisms are identified by three- or four-letter codes. The KEGG organisms ordering is consistent with NCBI taxonomy, with the first genome in each taxonomic rank considered a reference genome.
BlastKOALA and GhostKOALA are automatic annotation servers using pangenome data. BlastKOALA is suitable for annotating fully sequenced genomes, while GhostKOALA is suitable for large data sets like metagenomes. Both assign K numbers and allow KEGG mapping for high-level function interpretation. GhostKOALA also assigns taxonomic compositions.
KEGG provides knowledge on antimicrobial resistance (AMR) mechanisms in pathway maps and modules. Signature modules and KOs are used to characterize AMR from pathogen genomes. Beta-lactamase sequences are now in the addendum category. Drug interaction networks are generalized using drug groups. The KEGG DRUG database contains drug interaction data based on drug labels. Drug interaction data is expanded with DG representation, allowing better detection of drug interactions.
KEGG is accessible via the main website and GenomeNet mirror. BlastKOALA and GhostKOALA are maintained on the main website, while other tools are on GenomeNet. KEGG provides resources for translational bioinformatics, including antimicrobial resistance and drug interaction networksKEGG is an integrated database resource for biological interpretation of genome sequences and other high-throughput data. It provides molecular functions of genes and proteins associated with ortholog groups in the KEGG Orthology (KO) database. KEGG pathway maps, BRITE hierarchies, and KEGG modules represent high-level functions of the cell and organism. Over 4000 complete genomes are annotated with KOs in the KEGG GENES database, serving as a reference for KO assignment and pathway reconstruction. Improvements include re-examining KO records, adding viruses, plasmids, and functionally characterized proteins, and introducing new annotation servers, BlastKOALA and GhostKOALA, using non-redundant pangenome data. KEGG also provides data sets for antimicrobial resistance and drug interaction networks.
KEGG is an integrated database with 16 main databases, including systems, genomic, chemical, and health information. The PATHWAY, BRITE, and MODULE databases represent high-level functions. The GENOME and GENES databases contain complete genomes and gene catalogs. The KO database links genes to high-level functions. The COMPOUND, GLYCAN, REACTION, RPAIR, RCLASS, and ENZYME databases contain chemical substances and reactions. The DISEASE, DRUG, DGROUP, and ENVIRON databases provide disease and drug information. KEGG MEDICUS integrates these databases with drug labels.
Each KO represents a sequence similarity group. KO grouping is updated based on experimental evidence. The GENES database now includes viruses, plasmids, and addendum categories. The addendum category contains manually created protein sequence entries. KEGG organisms are identified by three- or four-letter codes. The KEGG organisms ordering is consistent with NCBI taxonomy, with the first genome in each taxonomic rank considered a reference genome.
BlastKOALA and GhostKOALA are automatic annotation servers using pangenome data. BlastKOALA is suitable for annotating fully sequenced genomes, while GhostKOALA is suitable for large data sets like metagenomes. Both assign K numbers and allow KEGG mapping for high-level function interpretation. GhostKOALA also assigns taxonomic compositions.
KEGG provides knowledge on antimicrobial resistance (AMR) mechanisms in pathway maps and modules. Signature modules and KOs are used to characterize AMR from pathogen genomes. Beta-lactamase sequences are now in the addendum category. Drug interaction networks are generalized using drug groups. The KEGG DRUG database contains drug interaction data based on drug labels. Drug interaction data is expanded with DG representation, allowing better detection of drug interactions.
KEGG is accessible via the main website and GenomeNet mirror. BlastKOALA and GhostKOALA are maintained on the main website, while other tools are on GenomeNet. KEGG provides resources for translational bioinformatics, including antimicrobial resistance and drug interaction networks