SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments

SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments

2016 | Andrew J. Page, Ben Taylor, Aidan J. Delaney, Jorge Soares, Torsten Seemann, Jacqueline A. Keane and Simon R. Harris
SNP-sites is a software tool for rapidly and efficiently extracting single nucleotide polymorphisms (SNPs) from multi-FASTA alignments. It is implemented in C and available under the GNU GPL v3 license. The tool is designed to handle large datasets with modest computational resources, making it feasible to run on standard desktop computers. It can process an 8.3 GB alignment file with 1842 taxa and 22,618 sites in 267 seconds using only 59 MB of RAM and one CPU core. SNP-sites is easy to install via Debian and Homebrew package managers and has been successfully tested on over 20 operating systems. It supports output in multiple formats, including FASTA, PHYLIP, and VCF, and is compatible with downstream analysis tools such as BCFtools and PLINK. The software is optimized for memory and I/O efficiency, with memory usage scaling with the volume of variation rather than the size of the input file. It outperforms existing tools like JVarKit and TrimAl in terms of memory usage and processing speed, especially for large datasets. SNP-sites has been tested on real data from Salmonella Typhi, demonstrating its effectiveness in handling large-scale genomic studies. The tool is suitable for prokaryotic population studies and is a valuable resource for bioinformatics researchers.SNP-sites is a software tool for rapidly and efficiently extracting single nucleotide polymorphisms (SNPs) from multi-FASTA alignments. It is implemented in C and available under the GNU GPL v3 license. The tool is designed to handle large datasets with modest computational resources, making it feasible to run on standard desktop computers. It can process an 8.3 GB alignment file with 1842 taxa and 22,618 sites in 267 seconds using only 59 MB of RAM and one CPU core. SNP-sites is easy to install via Debian and Homebrew package managers and has been successfully tested on over 20 operating systems. It supports output in multiple formats, including FASTA, PHYLIP, and VCF, and is compatible with downstream analysis tools such as BCFtools and PLINK. The software is optimized for memory and I/O efficiency, with memory usage scaling with the volume of variation rather than the size of the input file. It outperforms existing tools like JVarKit and TrimAl in terms of memory usage and processing speed, especially for large datasets. SNP-sites has been tested on real data from Salmonella Typhi, demonstrating its effectiveness in handling large-scale genomic studies. The tool is suitable for prokaryotic population studies and is a valuable resource for bioinformatics researchers.
Reach us at info@study.space