[slides and audio] Profile hidden Markov models

Profile hidden Markov models (profile HMMs) are probabilistic models that convert multiple sequence alignments into position-specific scoring systems for searching databases for remotely homologous sequences. They complement standard pairwise comparison methods for large-scale sequence analysis. Several software implementations and two large libraries of profile HMMs of common protein domains are available. HMM methods performed comparably to threading methods in the CASP2 structure prediction exercise. Profile HMMs are based on HMM theory, which is a class of probabilistic models applicable to time series or linear sequences. HMMs have been widely applied to speech recognition and later introduced into computational biology. Profile HMMs now provide a coherent theory for profile methods. They model sequences with states that represent different regions of the sequence, allowing for insertions and deletions. Profile HMMs are strongly linear, left-right models. Profile HMMs are used for sequence analysis, including gene finding, protein structure prediction, and fold recognition. They are more sensitive than ungapped models but require careful parameter tuning to avoid overfitting. Software packages like SAM, HMMER, PFTOOLS, and BLOCKS implement profile HMMs or HMM-like models. Two large collections of annotated profile HMMs are the Pfam database and the PROSITE Profiles database. These databases contain models for many protein domains and are used for searching sequence databases. Profile HMMs can be used for fold recognition, where they are sometimes viewed as 'mere sequence models'. However, they can also be applied to structural data, such as '3D/1D profiles'. Profile HMMs have been used to model secondary structure symbol sequences and align these models to secondary structure predictions of new protein sequences. The human genome project generates a deluge of raw sequence data, making automated sequence classification and annotation essential. Profile HMM methods provide a second tier of solid, sensitive, statistically based analysis tools that complement current BLAST and FASTA analyses. The combination of powerful new HMM software and large sequence alignment databases of conserved protein domains should help make this hope a reality.Profile hidden Markov models (profile HMMs) are probabilistic models that convert multiple sequence alignments into position-specific scoring systems for searching databases for remotely homologous sequences. They complement standard pairwise comparison methods for large-scale sequence analysis. Several software implementations and two large libraries of profile HMMs of common protein domains are available. HMM methods performed comparably to threading methods in the CASP2 structure prediction exercise. Profile HMMs are based on HMM theory, which is a class of probabilistic models applicable to time series or linear sequences. HMMs have been widely applied to speech recognition and later introduced into computational biology. Profile HMMs now provide a coherent theory for profile methods. They model sequences with states that represent different regions of the sequence, allowing for insertions and deletions. Profile HMMs are strongly linear, left-right models. Profile HMMs are used for sequence analysis, including gene finding, protein structure prediction, and fold recognition. They are more sensitive than ungapped models but require careful parameter tuning to avoid overfitting. Software packages like SAM, HMMER, PFTOOLS, and BLOCKS implement profile HMMs or HMM-like models. Two large collections of annotated profile HMMs are the Pfam database and the PROSITE Profiles database. These databases contain models for many protein domains and are used for searching sequence databases. Profile HMMs can be used for fold recognition, where they are sometimes viewed as 'mere sequence models'. However, they can also be applied to structural data, such as '3D/1D profiles'. Profile HMMs have been used to model secondary structure symbol sequences and align these models to secondary structure predictions of new protein sequences. The human genome project generates a deluge of raw sequence data, making automated sequence classification and annotation essential. Profile HMM methods provide a second tier of solid, sensitive, statistically based analysis tools that complement current BLAST and FASTA analyses. The combination of powerful new HMM software and large sequence alignment databases of conserved protein domains should help make this hope a reality.

Profile hidden Markov models

1998 | Sean R. Eddy