Published online 9 November 2010 | Rasko Leinonen1*, Hideaki Sugawara2 and Martin Shumway3 on behalf of the International Nucleotide Sequence Database Collaboration
The Sequence Read Archive (SRA) is an international public repository for next-generation sequence data, established by the International Nucleotide Sequence Database Collaboration (INSDC). The SRA is operated by the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EBI), and the DNA Data Bank of Japan (DDBJ). It aims to preserve public-domain sequencing data and provide free, unrestricted, and permanent access to it. As of mid-September 2010, the SRA contained over 500 billion reads, totaling 60 trillion base pairs, with almost 80% derived from the Illumina GA platform. The SRA supports widely used sequencing platforms such as Roche/454, Illumina Genome Analyzer, and SOLiD™, with support for other platforms coming soon. Recommended data submission levels and formats include base or SOLiD™ color calls and qualities for Illumina GA and SOLiD™ platforms, and signal information for the 454 platform. The SRA uses the NCBI SRA Toolkit for efficient storage and compression of data. The SRA is also exploring more efficient compression strategies to manage the growing volume of data.The Sequence Read Archive (SRA) is an international public repository for next-generation sequence data, established by the International Nucleotide Sequence Database Collaboration (INSDC). The SRA is operated by the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EBI), and the DNA Data Bank of Japan (DDBJ). It aims to preserve public-domain sequencing data and provide free, unrestricted, and permanent access to it. As of mid-September 2010, the SRA contained over 500 billion reads, totaling 60 trillion base pairs, with almost 80% derived from the Illumina GA platform. The SRA supports widely used sequencing platforms such as Roche/454, Illumina Genome Analyzer, and SOLiD™, with support for other platforms coming soon. Recommended data submission levels and formats include base or SOLiD™ color calls and qualities for Illumina GA and SOLiD™ platforms, and signal information for the 454 platform. The SRA uses the NCBI SRA Toolkit for efficient storage and compression of data. The SRA is also exploring more efficient compression strategies to manage the growing volume of data.