Received September 15, 2004; Revised and Accepted October 8, 2004 | Sam Griffiths-Jones*, Simon Moxon, Mhairi Marshall, Ajay Khanna1, Sean R. Eddy1 and Alex Bateman
Rfam is a comprehensive database of non-coding RNA (ncRNA) families, represented by multiple sequence alignments and profile stochastic context-free grammars (SCFGs). The database aims to facilitate the identification and classification of new members of known ncRNA families and provides annotation of ncRNAs in over 200 complete genomes. Recent improvements include the inclusion of structured regions of mRNA transcripts, such as self-splicing introns and cis-regulatory elements, and the introduction of a limited type ontology. Rfam has grown significantly, from annotating around 55,000 regions in release 1.0 to over 280,000 regions in release 6.1. The data provide insights into the conservation of multiple ncRNA families across different taxonomic groups, with a small number of large families being essential in all three kingdoms of life. Future challenges include improving computational efficiency, handling unstructured RNAs, and distinguishing functional copies from pseudogenes and repeats in higher eukaryotic genomes.Rfam is a comprehensive database of non-coding RNA (ncRNA) families, represented by multiple sequence alignments and profile stochastic context-free grammars (SCFGs). The database aims to facilitate the identification and classification of new members of known ncRNA families and provides annotation of ncRNAs in over 200 complete genomes. Recent improvements include the inclusion of structured regions of mRNA transcripts, such as self-splicing introns and cis-regulatory elements, and the introduction of a limited type ontology. Rfam has grown significantly, from annotating around 55,000 regions in release 1.0 to over 280,000 regions in release 6.1. The data provide insights into the conservation of multiple ncRNA families across different taxonomic groups, with a small number of large families being essential in all three kingdoms of life. Future challenges include improving computational efficiency, handling unstructured RNAs, and distinguishing functional copies from pseudogenes and repeats in higher eukaryotic genomes.