2013 June : 22(11): 3124–3140 | JULIAN CATCHEN, PAUL A. HOHENLOHE, SUSAN BASSHAM, ANGEL AMORES, WILLIAM A. CRESKO
The article introduces Stacks, an extension of the software package designed to efficiently analyze genotype-by-sequencing (GBS) data for population genomics studies. Stacks now includes core population genomic summary statistics and SNP-by-SNP statistical tests, which can be analyzed across a reference genome using a smoothed sliding window. The software also provides several output formats for downstream analysis packages. The authors describe the major steps of a Stacks analysis, including raw sequence read demultiplexing, data grouping into loci, identification of polymorphic nucleotide sites, and determination of allelic states. They also detail the de novo and reference-guided stack formation processes, SNP identification using a bounded-error model, and the conversion of SNPs to haplotypes. The populations program in Stacks calculates core population genetics statistics and allows for kernel smoothing of reference-aligned statistics. Bootstrap resampling is implemented to test the statistical significance of genome-wide statistics. The efficacy of kernel-smoothed FST analysis is demonstrated using RAD-seq data from threespine stickleback populations. The authors conclude by discussing the advantages of Stacks over other pipelines and its potential for future developments in population genomics.The article introduces Stacks, an extension of the software package designed to efficiently analyze genotype-by-sequencing (GBS) data for population genomics studies. Stacks now includes core population genomic summary statistics and SNP-by-SNP statistical tests, which can be analyzed across a reference genome using a smoothed sliding window. The software also provides several output formats for downstream analysis packages. The authors describe the major steps of a Stacks analysis, including raw sequence read demultiplexing, data grouping into loci, identification of polymorphic nucleotide sites, and determination of allelic states. They also detail the de novo and reference-guided stack formation processes, SNP identification using a bounded-error model, and the conversion of SNPs to haplotypes. The populations program in Stacks calculates core population genetics statistics and allows for kernel smoothing of reference-aligned statistics. Bootstrap resampling is implemented to test the statistical significance of genome-wide statistics. The efficacy of kernel-smoothed FST analysis is demonstrated using RAD-seq data from threespine stickleback populations. The authors conclude by discussing the advantages of Stacks over other pipelines and its potential for future developments in population genomics.