The I-TASSER Suite: protein structure and function prediction

The I-TASSER Suite: protein structure and function prediction

2015 January | Jianyi Yang, Renxiang Yan, Ambrish Roy, Dong Xu, Jonathan Poisson, and Yang Zhang
The I-TASSER Suite is a standalone software package for protein structure and function prediction. It was developed to address the challenge of assigning structure and function to all genes and gene products in the postgenomic era. The gap between the number of proteins with known sequences and those with experimentally characterized structures and functions is increasing, and computational methods are needed to bridge this gap. I-TASSER was originally designed for protein structure modeling using iterative threading assembly simulations and has been extended for structure-based function annotation by matching structure predictions with known functional templates. The I-TASSER Suite implements the I-TASSER-based protein structure and function modeling pipelines. It consists of four general steps: threading template identification, iterative structure assembly simulation, model selection and refinement, and structure-based function annotation. The first step involves threading the query sequence through a nonredundant structure library to identify structural templates. LOMETS, a meta-threading method containing eight fold-recognition programs, is used for this purpose. The structure folding and reassembling are conducted by replica-exchange Monte Carlo simulations under the guidance of an optimized knowledge-based force field. The lowest free-energy conformations are identified by structure clustering, and a second round of assembly simulation is conducted to remove steric clashes and refine global topology. Final atomic structure models are constructed from the low-energy conformations by a two-step atomic-level energy minimization approach. The correctness of the global model is assessed by the confidence score, which is based on the significance of threading alignments and the density of structure clustering. The residue-level local quality of the structural models and B factor of the target protein are evaluated by a newly developed method, ResQ. For function annotation, the structure models with the highest confidence scores are matched against the BioLiP database of ligand-protein interactions to detect homologous function templates. Functional insights on ligand-binding site (LBS), Enzyme Commission (EC) and Gene Ontology (GO) are deduced from the functional templates. Three complementary algorithms (COFACTOR, TM-SITE and S-SITE) are developed to enhance function inferences, and the consensus of which is derived by COACH using support vector machines. The I-TASSER Suite was tested in recent community-wide structure and function prediction experiments, including CASP10 and CAMEO. Overall, I-TASSER generated the correct fold with a template modeling score (TM-score) >0.5 for 10 out of 36 "New Fold" (NF) targets in the CASP10, which have no homologous templates in the Protein Data Bank (PDB). Of the 110 template-based modeling targets, 92 had a TM-score >0.5, and 89 had the templates drawn closer to the native with an average r.m.s. deviation improvement of 1.05 Å in the same threading-aligned regions. In CAMEOThe I-TASSER Suite is a standalone software package for protein structure and function prediction. It was developed to address the challenge of assigning structure and function to all genes and gene products in the postgenomic era. The gap between the number of proteins with known sequences and those with experimentally characterized structures and functions is increasing, and computational methods are needed to bridge this gap. I-TASSER was originally designed for protein structure modeling using iterative threading assembly simulations and has been extended for structure-based function annotation by matching structure predictions with known functional templates. The I-TASSER Suite implements the I-TASSER-based protein structure and function modeling pipelines. It consists of four general steps: threading template identification, iterative structure assembly simulation, model selection and refinement, and structure-based function annotation. The first step involves threading the query sequence through a nonredundant structure library to identify structural templates. LOMETS, a meta-threading method containing eight fold-recognition programs, is used for this purpose. The structure folding and reassembling are conducted by replica-exchange Monte Carlo simulations under the guidance of an optimized knowledge-based force field. The lowest free-energy conformations are identified by structure clustering, and a second round of assembly simulation is conducted to remove steric clashes and refine global topology. Final atomic structure models are constructed from the low-energy conformations by a two-step atomic-level energy minimization approach. The correctness of the global model is assessed by the confidence score, which is based on the significance of threading alignments and the density of structure clustering. The residue-level local quality of the structural models and B factor of the target protein are evaluated by a newly developed method, ResQ. For function annotation, the structure models with the highest confidence scores are matched against the BioLiP database of ligand-protein interactions to detect homologous function templates. Functional insights on ligand-binding site (LBS), Enzyme Commission (EC) and Gene Ontology (GO) are deduced from the functional templates. Three complementary algorithms (COFACTOR, TM-SITE and S-SITE) are developed to enhance function inferences, and the consensus of which is derived by COACH using support vector machines. The I-TASSER Suite was tested in recent community-wide structure and function prediction experiments, including CASP10 and CAMEO. Overall, I-TASSER generated the correct fold with a template modeling score (TM-score) >0.5 for 10 out of 36 "New Fold" (NF) targets in the CASP10, which have no homologous templates in the Protein Data Bank (PDB). Of the 110 template-based modeling targets, 92 had a TM-score >0.5, and 89 had the templates drawn closer to the native with an average r.m.s. deviation improvement of 1.05 Å in the same threading-aligned regions. In CAMEO
Reach us at info@study.space
[slides and audio] The I-TASSER Suite%3A protein structure and function prediction