The SILVA ribosomal RNA gene database project: improved data processing and web-based tools

The SILVA ribosomal RNA gene database project: improved data processing and web-based tools

2013 | Christian Quast¹, Elmar Pruesse¹,², Pelin Yilmaz¹, Jan Gerken¹,², Timmy Schweer¹, Pablo Yarza³, Jörg Peplies³ and Frank Oliver Glöckner¹,²,*
The SILVA ribosomal RNA gene database project provides a comprehensive, quality-controlled database of aligned ribosomal RNA (rRNA) gene sequences from Bacteria, Archaea, and Eukaryota, along with online tools for probe and primer evaluation, and optimized browsing, searching, and downloading. The database release 111 (July 2012) contains 3,194,778 small subunit (SSU) and 288,717 large subunit (LSU) rRNA gene sequences. The project has introduced new features, including advanced quality control procedures, an improved rRNA gene aligner, and online tools for probe and primer evaluation. The SILVA taxonomy and datasets provide an ideal reference for high-throughput classification of data from next-generation sequencing approaches. The SILVA databases are released periodically and structured into two datasets for each gene: SILVA Parc and SILVA Ref. The Parc datasets contain the entire SILVA databases, while the Ref datasets represent a subset of high-quality nearly full-length sequences. All SILVA datasets contain rich contextual and sequence-associated information, including taxonomic classifications, multiple sequence alignments, type strain information, and the latest valid nomenclature. Sequences are quality-checked and available in ARB, FASTA, and CSV formats. The SILVA project includes all three domains of life, offering databases for both SSU and LSU rRNA genes. The project has improved the quality control process, including the use of hidden Markov models for rRNA gene prediction. The quality criteria for sequences are based on statistical analyses, with thresholds for sequence quality and alignment quality. The SILVA taxonomy has been revised, incorporating the latest taxonomic classifications and using the List of Prokaryotic Names with Standing in Nomenclature (LPSN). The project also includes third-party contextual data, such as habitat descriptors and strain information. The SILVA Ref datasets are high-quality full-length sequences, with inclusion criteria based on sequence length and alignment score. The SSU Ref NR dataset is a non-redundant version of the SSU Ref dataset, created by clustering sequences at 99% or 98% identity. This dataset is recommended for use as the standard SILVA reference dataset for rRNA gene-based classification, phylogenetic analysis, and probe design. The SILVA website provides core database access features, online tools, and extensive documentation. The website includes a Taxonomy Browser, Search page, and Cart system for managing sequences. The Aligner page allows submission of sequence data for processing with SINA. The website also hosts information for partner projects such as the Living Tree Project and the Eukaryotic Taxonomy Working Group. The SILVA project has introduced new tools for probe and primer evaluation, such as TestProbe and TestPrime, which allow users to evaluate the suitability of probes and primers against the SILVA datasets. The project also provides a directThe SILVA ribosomal RNA gene database project provides a comprehensive, quality-controlled database of aligned ribosomal RNA (rRNA) gene sequences from Bacteria, Archaea, and Eukaryota, along with online tools for probe and primer evaluation, and optimized browsing, searching, and downloading. The database release 111 (July 2012) contains 3,194,778 small subunit (SSU) and 288,717 large subunit (LSU) rRNA gene sequences. The project has introduced new features, including advanced quality control procedures, an improved rRNA gene aligner, and online tools for probe and primer evaluation. The SILVA taxonomy and datasets provide an ideal reference for high-throughput classification of data from next-generation sequencing approaches. The SILVA databases are released periodically and structured into two datasets for each gene: SILVA Parc and SILVA Ref. The Parc datasets contain the entire SILVA databases, while the Ref datasets represent a subset of high-quality nearly full-length sequences. All SILVA datasets contain rich contextual and sequence-associated information, including taxonomic classifications, multiple sequence alignments, type strain information, and the latest valid nomenclature. Sequences are quality-checked and available in ARB, FASTA, and CSV formats. The SILVA project includes all three domains of life, offering databases for both SSU and LSU rRNA genes. The project has improved the quality control process, including the use of hidden Markov models for rRNA gene prediction. The quality criteria for sequences are based on statistical analyses, with thresholds for sequence quality and alignment quality. The SILVA taxonomy has been revised, incorporating the latest taxonomic classifications and using the List of Prokaryotic Names with Standing in Nomenclature (LPSN). The project also includes third-party contextual data, such as habitat descriptors and strain information. The SILVA Ref datasets are high-quality full-length sequences, with inclusion criteria based on sequence length and alignment score. The SSU Ref NR dataset is a non-redundant version of the SSU Ref dataset, created by clustering sequences at 99% or 98% identity. This dataset is recommended for use as the standard SILVA reference dataset for rRNA gene-based classification, phylogenetic analysis, and probe design. The SILVA website provides core database access features, online tools, and extensive documentation. The website includes a Taxonomy Browser, Search page, and Cart system for managing sequences. The Aligner page allows submission of sequence data for processing with SINA. The website also hosts information for partner projects such as the Living Tree Project and the Eukaryotic Taxonomy Working Group. The SILVA project has introduced new tools for probe and primer evaluation, such as TestProbe and TestPrime, which allow users to evaluate the suitability of probes and primers against the SILVA datasets. The project also provides a direct
Reach us at info@study.space