STRING: a database of predicted functional associations between proteins

STRING: a database of predicted functional associations between proteins

2003 | Christian von Mering, Martijn Huynen, Daniel Jaeggi, Steffen Schmidt, Peer Bork and Berend Snel
STRING is a database of predicted functional associations between proteins. It provides a precomputed global resource for exploring and analyzing associations between proteins inferred from genomic data. The database uses a unique scoring framework based on benchmarks of different types of associations against a common reference set, integrated into a single confidence score per prediction. This allows users to assess and compare the significance of individual predictions. The graphical representation of the network of inferred, weighted protein interactions provides a high-level view of functional linkage, facilitating the analysis of modularity in biological processes. STRING is continuously updated and currently contains 261,033 orthologs in 89 fully sequenced genomes. The database predicts functional interactions with an expected level of accuracy of at least 80% for more than half of the genes. It is online at http://www.bork.embl-heidelberg.de/STRING/. STRING uses genomic context to predict functional associations between proteins. This includes three types of evidence: gene fusion events, conserved gene order, and phylogenetic co-occurrence. The database integrates these three types of evidence into a single scoring framework, allowing for the assessment of the reliability of predictions. The scoring framework is based on benchmarking against a common reference set, and it allows for the comparison of different types of genomic associations. The database also provides a network display that allows users to navigate through the combined functional associations and visualize the network of interactions. The network display also allows iteration, enabling users to zoom out of a particular module and visualize its connections to other modules. For independent computational analysis, the entire set of predictions is available as computer-readable flat-files through the website. The prediction algorithms used in STRING have been validated previously, with only minor modifications made. The requirements for detecting gene fusions are more strict than previously published methods. Fused proteins are not recognized by homology, but rather by orthology of the fused parts to other, non-fused proteins. For neighborhood evidence, a repeatedly occurring neighborhood is required in species that are sufficiently remote to uncover functional constraints on gene order. For the analysis of gene co-occurrence, STRING uses a measure from information theory, mutual information, which quantifies the information gained from the knowledge that one gene is present about the presence of another gene in the same genome. The specific algorithm used here corrects for biases in the number of genomes sequenced for a particular branch of phylogeny, by collapsing into a single node those taxa in which the presence or absence of a specific gene pair is in agreement in all the species. STRING relies on the annotated proteomes maintained by SWISS-PROT for information on genomes, genes, and encoded proteins. Assignment of functional equivalence of genes across these genomes is essential for the predictions, and this information is derived from the manually curated orthology database, COGs. For any genomes not yet present in the COG database, orthology assignments are made by an automatic method resembling the COG procedure. This resultsSTRING is a database of predicted functional associations between proteins. It provides a precomputed global resource for exploring and analyzing associations between proteins inferred from genomic data. The database uses a unique scoring framework based on benchmarks of different types of associations against a common reference set, integrated into a single confidence score per prediction. This allows users to assess and compare the significance of individual predictions. The graphical representation of the network of inferred, weighted protein interactions provides a high-level view of functional linkage, facilitating the analysis of modularity in biological processes. STRING is continuously updated and currently contains 261,033 orthologs in 89 fully sequenced genomes. The database predicts functional interactions with an expected level of accuracy of at least 80% for more than half of the genes. It is online at http://www.bork.embl-heidelberg.de/STRING/. STRING uses genomic context to predict functional associations between proteins. This includes three types of evidence: gene fusion events, conserved gene order, and phylogenetic co-occurrence. The database integrates these three types of evidence into a single scoring framework, allowing for the assessment of the reliability of predictions. The scoring framework is based on benchmarking against a common reference set, and it allows for the comparison of different types of genomic associations. The database also provides a network display that allows users to navigate through the combined functional associations and visualize the network of interactions. The network display also allows iteration, enabling users to zoom out of a particular module and visualize its connections to other modules. For independent computational analysis, the entire set of predictions is available as computer-readable flat-files through the website. The prediction algorithms used in STRING have been validated previously, with only minor modifications made. The requirements for detecting gene fusions are more strict than previously published methods. Fused proteins are not recognized by homology, but rather by orthology of the fused parts to other, non-fused proteins. For neighborhood evidence, a repeatedly occurring neighborhood is required in species that are sufficiently remote to uncover functional constraints on gene order. For the analysis of gene co-occurrence, STRING uses a measure from information theory, mutual information, which quantifies the information gained from the knowledge that one gene is present about the presence of another gene in the same genome. The specific algorithm used here corrects for biases in the number of genomes sequenced for a particular branch of phylogeny, by collapsing into a single node those taxa in which the presence or absence of a specific gene pair is in agreement in all the species. STRING relies on the annotated proteomes maintained by SWISS-PROT for information on genomes, genes, and encoded proteins. Assignment of functional equivalence of genes across these genomes is essential for the predictions, and this information is derived from the manually curated orthology database, COGs. For any genomes not yet present in the COG database, orthology assignments are made by an automatic method resembling the COG procedure. This results
Reach us at info@study.space