Vol. 25 no. 16 2009, pages 2078–2079 | Heng Li1,†, Bob Handsaker2,†, Alec Wysoker2, Tim Fennell2, Jue Ruan3, Nils Homer4, Gabor Marth5, Goncalo Abecasis6, Richard Durbin1,* and 1000 Genome Project Data Processing Subgroup7
The article introduces the Sequence Alignment/Map (SAM) format, a versatile and flexible alignment format designed to store read alignments against reference sequences, supporting both short and long reads from various sequencing platforms. SAM is compact, efficient for random access, and is the format used for releasing alignments from the 1000 Genomes Project. The SAMtools software package, which includes utilities for indexing, variant calling, and alignment viewing, is also described. SAMtools can handle large alignment sets, support different types of reads, and provide detailed metadata. The article outlines the SAM format's structure, including mandatory and optional fields, and introduces the extended CIGAR string for more complex alignments. Additionally, it discusses the Binary Alignment/Map (BAM) format, which is a compressed binary representation of SAM, and the benefits of sorting and indexing SAM/BAM files for efficient data processing. The authors conclude that SAM and SAMtools offer a generic and modular approach to analyzing genomic sequencing data.The article introduces the Sequence Alignment/Map (SAM) format, a versatile and flexible alignment format designed to store read alignments against reference sequences, supporting both short and long reads from various sequencing platforms. SAM is compact, efficient for random access, and is the format used for releasing alignments from the 1000 Genomes Project. The SAMtools software package, which includes utilities for indexing, variant calling, and alignment viewing, is also described. SAMtools can handle large alignment sets, support different types of reads, and provide detailed metadata. The article outlines the SAM format's structure, including mandatory and optional fields, and introduces the extended CIGAR string for more complex alignments. Additionally, it discusses the Binary Alignment/Map (BAM) format, which is a compressed binary representation of SAM, and the benefits of sorting and indexing SAM/BAM files for efficient data processing. The authors conclude that SAM and SAMtools offer a generic and modular approach to analyzing genomic sequencing data.