February 19, 2015 | Artem Tarasov, Albert J. Vilella, Edwin Cuppen, Isaac J. Nijman and Pjotr Prins
Sambamba is a high-performance, robust tool and library for processing SAM, BAM, and CRAM files, which are standard formats for next-generation sequencing (NGS) data. It is a faster alternative to samtools, utilizing multi-core processing to significantly reduce processing time. Sambamba is adopted by sequencing centers not only for its speed but also for additional features like coverage analysis and powerful filtering.
Sambamba is free and open-source, available under a GPLv2 license. It is written in the D programming language, which offers performance similar to C and supports efficient parallel processing. Sambamba leverages D's parallel computing capabilities and integrates with the htslib C-library for CRAM support and the original samtools for mpileup processing, resulting in improved performance on multi-core systems.
Sambamba is a robust replacement for samtools commands such as INDEX, SORT, VIEW, MPILEUP, MARKDUP, MERGE, and FLAGSTAT. It offers new functionalities including read depth analysis, slicing BAM files without decompression, and filtering with logic operators and regular expressions. It also generates JSON output for easier processing.
Sambamba's performance is demonstrated by reducing bioinformatics processing time from 2 hours to 30 minutes in a human cancer exome SNV calling pipeline. It is most effective on machines where CPU utilization is the bottleneck, though performance gains may be limited in cluster setups with shared storage bottlenecks.
Sambamba adheres to the 'Small tools MANIFESTO for Bioinformatics', with extensible and maintainable source code. It uses Ragel for efficient SAM parsing and includes unit testing and continuous integration testing for reliability.
Sambamba exemplifies how to effectively use the D programming language and multi-core computers to reduce the time needed to process NGS data. It is particularly relevant in the context of whole genome sequencing and increasing sample numbers.Sambamba is a high-performance, robust tool and library for processing SAM, BAM, and CRAM files, which are standard formats for next-generation sequencing (NGS) data. It is a faster alternative to samtools, utilizing multi-core processing to significantly reduce processing time. Sambamba is adopted by sequencing centers not only for its speed but also for additional features like coverage analysis and powerful filtering.
Sambamba is free and open-source, available under a GPLv2 license. It is written in the D programming language, which offers performance similar to C and supports efficient parallel processing. Sambamba leverages D's parallel computing capabilities and integrates with the htslib C-library for CRAM support and the original samtools for mpileup processing, resulting in improved performance on multi-core systems.
Sambamba is a robust replacement for samtools commands such as INDEX, SORT, VIEW, MPILEUP, MARKDUP, MERGE, and FLAGSTAT. It offers new functionalities including read depth analysis, slicing BAM files without decompression, and filtering with logic operators and regular expressions. It also generates JSON output for easier processing.
Sambamba's performance is demonstrated by reducing bioinformatics processing time from 2 hours to 30 minutes in a human cancer exome SNV calling pipeline. It is most effective on machines where CPU utilization is the bottleneck, though performance gains may be limited in cluster setups with shared storage bottlenecks.
Sambamba adheres to the 'Small tools MANIFESTO for Bioinformatics', with extensible and maintainable source code. It uses Ragel for efficient SAM parsing and includes unit testing and continuous integration testing for reliability.
Sambamba exemplifies how to effectively use the D programming language and multi-core computers to reduce the time needed to process NGS data. It is particularly relevant in the context of whole genome sequencing and increasing sample numbers.