[slides and audio] Database of homology%E2%80%90derived protein structures and the structural meaning of sequence alignment

The paper presents a database of homology-derived protein structures (HSSP) and discusses the structural significance of sequence alignment. The database is built by aligning all homologous sequences to known protein structures, based on a homology threshold curve that depends on alignment length. This threshold curve is derived from an analysis of thousands of sequence-structure alignments, quantifying the relationship between sequence similarity, structural similarity, and alignment length. The resulting database contains aligned sequences, secondary structure, sequence variability, and sequence profile for each known protein structure. The database significantly increases the number of known protein structures by a factor of five, to over 1800. The results can be useful in assessing the structural significance of sequence database matches, deriving preferences and patterns for structure prediction, elucidating the structural role of conserved residues, and modeling three-dimensional detail by homology. The paper also discusses the limitations of current structure prediction methods, which are constrained by the size of the database. It highlights the importance of sequence homology in predicting protein structure and the challenges in determining the structural significance of weak or short sequence similarities. The homology threshold is defined as a function of alignment length, with higher similarity required for shorter alignments. The threshold is determined by analyzing the distribution of points in a three-dimensional scatter plot of sequence similarity, structural similarity, and alignment length. The threshold is used to identify structurally homologous protein pairs. The paper describes the process of generating the HSSP database, which involves searching for homologous sequences in the sequence database and aligning them to known protein structures. The database includes information on sequence variation, such as variability and entropy, which are derived from multiple sequence alignments. The database is also used to study the evolution of protein sequence and structure, derive more reliable preference parameters for structure prediction, and extract weighted sequence profiles for database searches. The database is available for free distribution and can be used for various applications in structural biology. The paper concludes with recommendations for the use of the HSSP database and future extensions and applications.The paper presents a database of homology-derived protein structures (HSSP) and discusses the structural significance of sequence alignment. The database is built by aligning all homologous sequences to known protein structures, based on a homology threshold curve that depends on alignment length. This threshold curve is derived from an analysis of thousands of sequence-structure alignments, quantifying the relationship between sequence similarity, structural similarity, and alignment length. The resulting database contains aligned sequences, secondary structure, sequence variability, and sequence profile for each known protein structure. The database significantly increases the number of known protein structures by a factor of five, to over 1800. The results can be useful in assessing the structural significance of sequence database matches, deriving preferences and patterns for structure prediction, elucidating the structural role of conserved residues, and modeling three-dimensional detail by homology. The paper also discusses the limitations of current structure prediction methods, which are constrained by the size of the database. It highlights the importance of sequence homology in predicting protein structure and the challenges in determining the structural significance of weak or short sequence similarities. The homology threshold is defined as a function of alignment length, with higher similarity required for shorter alignments. The threshold is determined by analyzing the distribution of points in a three-dimensional scatter plot of sequence similarity, structural similarity, and alignment length. The threshold is used to identify structurally homologous protein pairs. The paper describes the process of generating the HSSP database, which involves searching for homologous sequences in the sequence database and aligning them to known protein structures. The database includes information on sequence variation, such as variability and entropy, which are derived from multiple sequence alignments. The database is also used to study the evolution of protein sequence and structure, derive more reliable preference parameters for structure prediction, and extract weighted sequence profiles for database searches. The database is available for free distribution and can be used for various applications in structural biology. The paper concludes with recommendations for the use of the HSSP database and future extensions and applications.

Database of Homology-Derived Protein Structures and the Structural Meaning of Sequence Alignment

1991 | Chris Sander and Reinhard Schneider