The COG database: an updated version includes eukaryotes

The COG database: an updated version includes eukaryotes

11 September 2003 | Roman L Tatusov*, Natalie D Fedorova1, John D Jackson1, Aviva R Jacobs1, Boris Kiryutin1, Eugene V Koonin1, Dmitri M Krylov1, Raja Mazumder2, Sergei L Mekhedov1, Anastasia N Nikolskaya2, B Sridhar Rao1, Sergei Smirnov1, Alexander V Sverdlov1, Sona Vasudevan1, Yuri I Wolf1, Jodie J Yin1 and Darren A Natale2
The COG database has been updated to include eukaryotes. This database contains clusters of orthologous groups (COGs) of proteins from prokaryotic and unicellular eukaryotic genomes. The updated version also includes eukaryotic orthologous groups (KOGs) for seven eukaryotic genomes: three animals (Caenorhabditis elegans, Drosophila melanogaster, and Homo sapiens), one plant (Arabidopsis thaliana), two fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe), and the intracellular microsporidian parasite Encephalitozoon cuniculi. The COG collection currently consists of 138,458 proteins, forming 4873 COGs, which account for 75% of the 185,505 predicted proteins in 66 unicellular genomes. The KOG set includes 4852 clusters of orthologs, containing 59,838 proteins, or about 54% of the analyzed eukaryotic gene products. The eukaryotic KOGs include a conserved core of genes present in all analyzed species, representing about 20% of the KOG set. This is much greater than the ubiquitous portion of the COG set (~1% of the COGs). The difference is likely due to the small number of included eukaryotic genomes, but it may also reflect the relative compactness of eukaryotes as a clade and the greater evolutionary stability of eukaryotic genomes. The updated collection of orthologous protein sets for prokaryotes and eukaryotes is expected to be a useful platform for functional annotation of newly sequenced genomes, including those of complex eukaryotes, and genome-wide evolutionary studies. The COG system has become a widely used tool for computational genomics, with important applications in functional annotation of newly sequenced genomes and genome-wide evolutionary analyses. The updated COGs include over 63 sequenced prokaryotic genomes and three genomes of unicellular eukaryotes. The COG system is extended to complex, multicellular eukaryotes by constructing clusters of probable orthologs, which are named KOGs for 7 sequenced genomes of animals, fungi, microsporidia, and plants. The KOGs are constructed from annotated proteins encoded in the genomes of three animals, the green plant Arabidopsis thaliana, two fungi, and the microsporidian Encephalitozoon cuniculi. The construction of KOGs involves a procedure similar to that used for prokaryotic genomes, with additional steps to account for specific features of eukaryotic proteins. The KOGs are accompanied by a phyletic pattern search tool, which allowsThe COG database has been updated to include eukaryotes. This database contains clusters of orthologous groups (COGs) of proteins from prokaryotic and unicellular eukaryotic genomes. The updated version also includes eukaryotic orthologous groups (KOGs) for seven eukaryotic genomes: three animals (Caenorhabditis elegans, Drosophila melanogaster, and Homo sapiens), one plant (Arabidopsis thaliana), two fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe), and the intracellular microsporidian parasite Encephalitozoon cuniculi. The COG collection currently consists of 138,458 proteins, forming 4873 COGs, which account for 75% of the 185,505 predicted proteins in 66 unicellular genomes. The KOG set includes 4852 clusters of orthologs, containing 59,838 proteins, or about 54% of the analyzed eukaryotic gene products. The eukaryotic KOGs include a conserved core of genes present in all analyzed species, representing about 20% of the KOG set. This is much greater than the ubiquitous portion of the COG set (~1% of the COGs). The difference is likely due to the small number of included eukaryotic genomes, but it may also reflect the relative compactness of eukaryotes as a clade and the greater evolutionary stability of eukaryotic genomes. The updated collection of orthologous protein sets for prokaryotes and eukaryotes is expected to be a useful platform for functional annotation of newly sequenced genomes, including those of complex eukaryotes, and genome-wide evolutionary studies. The COG system has become a widely used tool for computational genomics, with important applications in functional annotation of newly sequenced genomes and genome-wide evolutionary analyses. The updated COGs include over 63 sequenced prokaryotic genomes and three genomes of unicellular eukaryotes. The COG system is extended to complex, multicellular eukaryotes by constructing clusters of probable orthologs, which are named KOGs for 7 sequenced genomes of animals, fungi, microsporidia, and plants. The KOGs are constructed from annotated proteins encoded in the genomes of three animals, the green plant Arabidopsis thaliana, two fungi, and the microsporidian Encephalitozoon cuniculi. The construction of KOGs involves a procedure similar to that used for prokaryotic genomes, with additional steps to account for specific features of eukaryotic proteins. The KOGs are accompanied by a phyletic pattern search tool, which allows
Reach us at info@study.space