Twelve years of SAMtools and BCFtools

Twelve years of SAMtools and BCFtools

| Petr Danecek, James K. Bonfield, Jennifer Liddle, John Marshall, Valeriu Ohan, Martin O Pollard, Andrew Whitwham, Thomas Keane, Shane A. McCarthy, Robert M. Davies, Heng Li
SAMtools and BCFtools are widely used tools for processing and analyzing high-throughput sequencing data. They provide a range of functionalities including file format conversion, sorting, querying, statistics, variant calling, and effect analysis. These tools have been continuously developed and maintained for twelve years, with many new features and improvements added over time. They are freely available under the MIT license and have been installed over a million times via Bioconda. SAMtools was first released in 2009 and has undergone significant development, including support for the CRAM format. It includes a variety of commands for working with alignment data, such as viewing, sorting, and indexing. The tool has also gained support for amplicon-based sequencing projects. SAMtools has become faster with the ability to use threads for parallel processing, and it now supports indexing files as they are written. BCFtools was developed to handle variant calling and is now a full-featured program with 21 commands and 38 plugins. It supports conversion between text VCF and binary BCF formats, and offers various tools for processing variant data. BCFtools includes variant callers and algorithms for analysis, such as SNP and indel calling, detection of runs of homozygosity, and copy-number variation calling. It also supports a dynamic plugin mechanism for specific tasks. Both SAMtools and BCFtools are written in C, optimized for low memory consumption and high speed. They have been used to process and analyze sequencing data from a wide range of species, including vertebrates, non-vertebrates, pathogens, plants, and viruses. The tools have been continuously improved, with extensive testing and quality assurance measures in place. They are available on GitHub and have a large user base, with many contributions and feature requests. The tools are also used in various bioinformatics pipelines and have been cited in numerous publications.SAMtools and BCFtools are widely used tools for processing and analyzing high-throughput sequencing data. They provide a range of functionalities including file format conversion, sorting, querying, statistics, variant calling, and effect analysis. These tools have been continuously developed and maintained for twelve years, with many new features and improvements added over time. They are freely available under the MIT license and have been installed over a million times via Bioconda. SAMtools was first released in 2009 and has undergone significant development, including support for the CRAM format. It includes a variety of commands for working with alignment data, such as viewing, sorting, and indexing. The tool has also gained support for amplicon-based sequencing projects. SAMtools has become faster with the ability to use threads for parallel processing, and it now supports indexing files as they are written. BCFtools was developed to handle variant calling and is now a full-featured program with 21 commands and 38 plugins. It supports conversion between text VCF and binary BCF formats, and offers various tools for processing variant data. BCFtools includes variant callers and algorithms for analysis, such as SNP and indel calling, detection of runs of homozygosity, and copy-number variation calling. It also supports a dynamic plugin mechanism for specific tasks. Both SAMtools and BCFtools are written in C, optimized for low memory consumption and high speed. They have been used to process and analyze sequencing data from a wide range of species, including vertebrates, non-vertebrates, pathogens, plants, and viruses. The tools have been continuously improved, with extensive testing and quality assurance measures in place. They are available on GitHub and have a large user base, with many contributions and feature requests. The tools are also used in various bioinformatics pipelines and have been cited in numerous publications.
Reach us at info@study.space
[slides and audio] Twelve years of SAMtools and BCFtools