Understanding Hidden Markov model speed heuristic and iterative HMM search procedure

This study introduces HMMERHEAD, a heuristic database filtering method that significantly reduces the time required to score profile hidden Markov models (profile-HMMs) against large sequence databases. HMMERHEAD reduces search time by 20-fold for the Forward algorithm and 6-fold for the Viterbi algorithm, with minimal loss in sensitivity. It also enables the implementation of an iterative profile-HMM search method, JackHMMER, which detects significantly more remote protein homologs than existing methods like SAM's T2K and NCBI's PSI-BLAST. HMMERHEAD works by first identifying significant "words" from profile-HMMs and then using these words to seed ungapped alignments. Database sequences that pass these filters are further analyzed using gapped Viterbi alignment. If the Viterbi score exceeds a threshold, the sequence is scored using the full HMMER algorithm. JackHMMER is an iterative method that starts with initial homologs identified using database searches. It then builds a hidden Markov model from these alignments, searches the database, and iteratively refines the model until no new homologs are found. This method outperforms existing iterative methods in detecting remote homologs, detecting 14% more homologs than T2K and 28% more than PSI-BLAST. The study also compares HMMERHEAD with other methods like WU-BLAST and finds that HMMERHEAD maintains high sensitivity with a small loss in true homolog detection. The results show that the speed gains from HMMERHEAD are acceptable trade-offs for the small loss in sensitivity. The study uses a benchmark of 2,521 query alignments and 16,986 sequences to evaluate the performance of these methods. The results demonstrate that HMMERHEAD and JackHMMER significantly improve the efficiency and sensitivity of profile-HMM searches. The methods are available for download, and the benchmark data is provided for further analysis.This study introduces HMMERHEAD, a heuristic database filtering method that significantly reduces the time required to score profile hidden Markov models (profile-HMMs) against large sequence databases. HMMERHEAD reduces search time by 20-fold for the Forward algorithm and 6-fold for the Viterbi algorithm, with minimal loss in sensitivity. It also enables the implementation of an iterative profile-HMM search method, JackHMMER, which detects significantly more remote protein homologs than existing methods like SAM's T2K and NCBI's PSI-BLAST. HMMERHEAD works by first identifying significant "words" from profile-HMMs and then using these words to seed ungapped alignments. Database sequences that pass these filters are further analyzed using gapped Viterbi alignment. If the Viterbi score exceeds a threshold, the sequence is scored using the full HMMER algorithm. JackHMMER is an iterative method that starts with initial homologs identified using database searches. It then builds a hidden Markov model from these alignments, searches the database, and iteratively refines the model until no new homologs are found. This method outperforms existing iterative methods in detecting remote homologs, detecting 14% more homologs than T2K and 28% more than PSI-BLAST. The study also compares HMMERHEAD with other methods like WU-BLAST and finds that HMMERHEAD maintains high sensitivity with a small loss in true homolog detection. The results show that the speed gains from HMMERHEAD are acceptable trade-offs for the small loss in sensitivity. The study uses a benchmark of 2,521 query alignments and 16,986 sequences to evaluate the performance of these methods. The results demonstrate that HMMERHEAD and JackHMMER significantly improve the efficiency and sensitivity of profile-HMM searches. The methods are available for download, and the benchmark data is provided for further analysis.

Hidden Markov model speed heuristic and iterative HMM search procedure

2010 | L Steven Johnson, Sean R Eddy, Elon Portugaly