[slides and audio] Rfam%3A an RNA family database

Rfam is a database of multiple sequence alignments and covariance models representing non-coding RNA (ncRNA) families. It is available online at http://www.sanger.ac.uk/Software/Rfam/ (UK) and http://rfam.wustl.edu/ (US). Users can search a query sequence against a library of covariance models, view multiple sequence alignments, and access family annotations. The database can also be downloaded in flatfile format for local use with the INFERNAL package. The first release of Rfam (1.0) contains 25 families, which annotate over 50,000 ncRNA genes in the EMBL nucleotide database. Rfam aims to integrate existing curated structural RNA alignments into a common structure-annotated format, similar to Pfam's curated seed alignments. It also uses covariance models to search sequence databases and maintain automatically-generated alignments of detectable homologues, similar to Pfam's full alignments. Additionally, it provides a system for automatically analyzing and annotating sequences for homologues to known structural RNAs. Each family in Rfam is represented by two multiple sequence alignments and a covariance model. The seed alignment contains known representative members and is hand-curated with structural information. The seed alignment is used to build a covariance model using the CMBUILD program. The model is then used to search a nucleotide sequence database using the CMSEARCH program. Matches are aligned to the model using the CMALIGN program. The nucleotide database searched is called RFAMSEQ, built from a subset of the EMBL nucleotide database. RFAMSEQ 1 contains 1,075,317 sequences and over 5.3 billion bases. CM searches are computationally expensive, so an initial BLAST search is used to reduce the search space. All BLAST hits with P-value <10 are retrieved, and a family-specific window size is added to each end of the matches. The reduced database is then subjected to a full CM search. Rfam provides a web-based interface for searching DNA sequences against the library of covariance models. Users can view annotations, follow links to other databases and literature references, and access multiple sequence alignments in various formats. The alignments include secondary structure markup and color-encoded representations of co-varying columns. The web pages also allow users to quickly determine the species distribution within a family. Rfam is under active development and will increase in size and scope. It aims to translate new discoveries into useful and searchable RNA families. However, it faces computational challenges with covariance models. Technological advances are expected to make full CM searches more feasible in the future. Until then, BLAST is used to narrow the search space, though it may reduce search sensitivity. Rfam provides a useful tool for genome annotation and a comprehensive resource for RNA family information and multiple sequence alignments.Rfam is a database of multiple sequence alignments and covariance models representing non-coding RNA (ncRNA) families. It is available online at http://www.sanger.ac.uk/Software/Rfam/ (UK) and http://rfam.wustl.edu/ (US). Users can search a query sequence against a library of covariance models, view multiple sequence alignments, and access family annotations. The database can also be downloaded in flatfile format for local use with the INFERNAL package. The first release of Rfam (1.0) contains 25 families, which annotate over 50,000 ncRNA genes in the EMBL nucleotide database. Rfam aims to integrate existing curated structural RNA alignments into a common structure-annotated format, similar to Pfam's curated seed alignments. It also uses covariance models to search sequence databases and maintain automatically-generated alignments of detectable homologues, similar to Pfam's full alignments. Additionally, it provides a system for automatically analyzing and annotating sequences for homologues to known structural RNAs. Each family in Rfam is represented by two multiple sequence alignments and a covariance model. The seed alignment contains known representative members and is hand-curated with structural information. The seed alignment is used to build a covariance model using the CMBUILD program. The model is then used to search a nucleotide sequence database using the CMSEARCH program. Matches are aligned to the model using the CMALIGN program. The nucleotide database searched is called RFAMSEQ, built from a subset of the EMBL nucleotide database. RFAMSEQ 1 contains 1,075,317 sequences and over 5.3 billion bases. CM searches are computationally expensive, so an initial BLAST search is used to reduce the search space. All BLAST hits with P-value <10 are retrieved, and a family-specific window size is added to each end of the matches. The reduced database is then subjected to a full CM search. Rfam provides a web-based interface for searching DNA sequences against the library of covariance models. Users can view annotations, follow links to other databases and literature references, and access multiple sequence alignments in various formats. The alignments include secondary structure markup and color-encoded representations of co-varying columns. The web pages also allow users to quickly determine the species distribution within a family. Rfam is under active development and will increase in size and scope. It aims to translate new discoveries into useful and searchable RNA families. However, it faces computational challenges with covariance models. Technological advances are expected to make full CM searches more feasible in the future. Until then, BLAST is used to narrow the search space, though it may reduce search sensitivity. Rfam provides a useful tool for genome annotation and a comprehensive resource for RNA family information and multiple sequence alignments.

Rfam: an RNA family database

2003 | Sam Griffiths-Jones*, Alex Bateman, Mhairi Marshall, Ajay Khanna¹ and Sean R. Eddy¹