Expanded microbial genome coverage and improved protein family annotation in the COG database

Expanded microbial genome coverage and improved protein family annotation in the COG database

26 November 2014 | Michael Y. Galperin, Kira S. Makarova, Yuri I. Wolf and Eugene V. Koonin
The COG database, first created in 1997, has been a key resource for functional annotation of microbial genomes. This update, the first since 2003, expands genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level. The re-analysis of COGs shows an error rate below 0.5%, allowing assessment of progress in functional genomics over the past 12 years. Many previously uncharacterized COGs have had their functions elucidated, and many tentative assignments have been validated through experiments or high-throughput methods. The new version includes functions for several widespread, conserved proteins involved in translation, including rRNA maturation and tRNA modification. The COG database now includes 4631 COGs, with 4215 containing less than 1000 genes. The database has been updated to include more accurate annotations, with changes to COG names to standardize formatting and reflect experimental validation. The new version also includes more detailed functional categories, with some COGs now classified under 'Function unknown'. The database is publicly available and includes tools for genome annotation. The update also highlights the importance of COG phyletic patterns in identifying missed genes and improving genome annotation. The new version of the COG database is expected to become an important tool for microbial genomics.The COG database, first created in 1997, has been a key resource for functional annotation of microbial genomes. This update, the first since 2003, expands genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level. The re-analysis of COGs shows an error rate below 0.5%, allowing assessment of progress in functional genomics over the past 12 years. Many previously uncharacterized COGs have had their functions elucidated, and many tentative assignments have been validated through experiments or high-throughput methods. The new version includes functions for several widespread, conserved proteins involved in translation, including rRNA maturation and tRNA modification. The COG database now includes 4631 COGs, with 4215 containing less than 1000 genes. The database has been updated to include more accurate annotations, with changes to COG names to standardize formatting and reflect experimental validation. The new version also includes more detailed functional categories, with some COGs now classified under 'Function unknown'. The database is publicly available and includes tools for genome annotation. The update also highlights the importance of COG phyletic patterns in identifying missed genes and improving genome annotation. The new version of the COG database is expected to become an important tool for microbial genomics.
Reach us at info@study.space