Understanding Rfam%3A annotating non-coding RNAs in complete genomes

Rfam is a comprehensive database of non-coding RNA (ncRNA) families, represented by multiple sequence alignments and profile stochastic context-free grammars (SCFGs). It aims to facilitate the identification and classification of new members of known sequence families and provides annotation of ncRNAs in over 200 complete genome sequences. The database has grown significantly, from 25 families in release 1.0 to 379 families in release 6.1, with a large number of families specific to certain taxa. Rfam includes not only bona fide ncRNA genes but also structured regions of mRNA transcripts, such as self-splicing introns and cis-regulatory elements. The database now contains 308 gene families, 69 cis-regulatory elements, and two self-splicing introns. Rfam is available via the Web at http://www.sanger.ac.uk/Software/Rfam/ and http://rfam.wustl.edu/, and all data are available for download and sequence searching using the INFERNAL software package. Rfam provides annotation of over 13,400 candidate ncRNA genes in 224 completed chromosomes and genomes. The average bacterial genome contains over 80 hits, dominated by tRNAs. Rfam has annotated regions in Bacillus genomes, including recently described riboswitches. The data provide the first comprehensive view of the distribution of ncRNAs in the three kingdoms of life. There are a small number of very large families representing some of the best-understood RNAs. However, many families are highly divergent and computationally difficult to recognize. Rfam contains over 30 ncRNA families based on verified genes. Few large-scale studies have been conducted in archaea or eukaryotes, and such efforts will identify many more small families. Future challenges include the computational expense of profile SCFG searches and the need to include unstructured RNAs. Rfam will continue to translate novel discoveries of ncRNA genes into alignments and models useful for genome annotation and phylogenetic analysis.Rfam is a comprehensive database of non-coding RNA (ncRNA) families, represented by multiple sequence alignments and profile stochastic context-free grammars (SCFGs). It aims to facilitate the identification and classification of new members of known sequence families and provides annotation of ncRNAs in over 200 complete genome sequences. The database has grown significantly, from 25 families in release 1.0 to 379 families in release 6.1, with a large number of families specific to certain taxa. Rfam includes not only bona fide ncRNA genes but also structured regions of mRNA transcripts, such as self-splicing introns and cis-regulatory elements. The database now contains 308 gene families, 69 cis-regulatory elements, and two self-splicing introns. Rfam is available via the Web at http://www.sanger.ac.uk/Software/Rfam/ and http://rfam.wustl.edu/, and all data are available for download and sequence searching using the INFERNAL software package. Rfam provides annotation of over 13,400 candidate ncRNA genes in 224 completed chromosomes and genomes. The average bacterial genome contains over 80 hits, dominated by tRNAs. Rfam has annotated regions in Bacillus genomes, including recently described riboswitches. The data provide the first comprehensive view of the distribution of ncRNAs in the three kingdoms of life. There are a small number of very large families representing some of the best-understood RNAs. However, many families are highly divergent and computationally difficult to recognize. Rfam contains over 30 ncRNA families based on verified genes. Few large-scale studies have been conducted in archaea or eukaryotes, and such efforts will identify many more small families. Future challenges include the computational expense of profile SCFG searches and the need to include unstructured RNAs. Rfam will continue to translate novel discoveries of ncRNA genes into alignments and models useful for genome annotation and phylogenetic analysis.

Rfam: annotating non-coding RNAs in complete genomes

2005 | Sam Griffiths-Jones*, Simon Moxon, Mhairi Marshall, Ajay Khanna¹, Sean R. Eddy¹ and Alex Bateman