1994, Vol. 22, No. 11 | Sean R.Eddy* and Richard Durbin
The paper introduces a general approach to RNA sequence analysis using probabilistic models called "covariance models" (CMs). CMs flexibly describe both the secondary structure and primary sequence consensus of an RNA sequence family. The authors describe an algorithm for building CMs from existing sequence alignments or even from unaligned example sequences, using an iterative training procedure. The CMs are then used for consensus secondary structure prediction, multiple sequence alignment, and database similarity searching. The algorithms are tested on a trusted alignment of 1415 tRNA sequences and genomic sequence data from the C. elegans genome sequencing project. The results show that the automatically constructed tRNA CM is significantly more sensitive for database searching than custom-built tRNA searching programs and produces high-quality multiple alignments. The approach may be applied to any family of small RNA sequences.The paper introduces a general approach to RNA sequence analysis using probabilistic models called "covariance models" (CMs). CMs flexibly describe both the secondary structure and primary sequence consensus of an RNA sequence family. The authors describe an algorithm for building CMs from existing sequence alignments or even from unaligned example sequences, using an iterative training procedure. The CMs are then used for consensus secondary structure prediction, multiple sequence alignment, and database similarity searching. The algorithms are tested on a trusted alignment of 1415 tRNA sequences and genomic sequence data from the C. elegans genome sequencing project. The results show that the automatically constructed tRNA CM is significantly more sensitive for database searching than custom-built tRNA searching programs and produces high-quality multiple alignments. The approach may be applied to any family of small RNA sequences.