VarScan: variant detection in massively parallel sequencing of individual and pooled samples

VarScan: variant detection in massively parallel sequencing of individual and pooled samples

June 19, 2009 | Daniel C. Koboldt, Ken Chen, Todd Wylie, David E. Larson, Michael D. McLe llan, Elaine R. Mardis, George M. Weinstock, Richard K. Wilson and Li Ding
VarScan is an open-source tool for detecting SNPs, insertions, and deletions in massively parallel sequencing data. It is compatible with several short read aligners, including BLAT, Newbler, cross_match, Bowtie, and Novoalign, and can analyze both individual and pooled samples. The tool is implemented in Perl with inline C and is freely available for non-commercial use at http://genome.wustl.edu/tools/cancer-genomics. VarScan processes alignments by scoring and sorting reads, discarding those with low identity or multiple alignments. It then screens for sequence changes and combines variants detected in multiple reads into unique SNPs and indels. For each variant, VarScan determines coverage, supporting reads, base quality, and strand counts. Thresholds for variant calling can be set automatically or manually. In a study using real data from targeted resequencing of 1000 PCR amplicons on Roche/454 and Illumina/Solexa platforms, VarScan demonstrated high sensitivity and specificity. It detected 344 of 359 SNPs in 454 data, with 95.82% accuracy, and 349 of 359 SNPs confirmed in Illumina data or present in dbSNP. VarScan also detected 46 of 77 high-confidence small indels in Illumina data. VarScan's ability to detect variants in pooled samples is particularly useful for low-frequency variants. It can detect variants at 1% frequency, which is advantageous for pooled sequencing. Comparisons between 454/Illumina data and dbSNP suggest 97% specificity in individual 454 data and 93% sensitivity in pooled Illumina data. VarScan is a platform-independent tool that can be expanded to accommodate additional aligner outputs and data types. It is an open-source, modular tool that continues to evolve with new sequencing technologies and data processing algorithms. The tool is recommended for large-scale targeted studies of genetic variation by deep resequencing.VarScan is an open-source tool for detecting SNPs, insertions, and deletions in massively parallel sequencing data. It is compatible with several short read aligners, including BLAT, Newbler, cross_match, Bowtie, and Novoalign, and can analyze both individual and pooled samples. The tool is implemented in Perl with inline C and is freely available for non-commercial use at http://genome.wustl.edu/tools/cancer-genomics. VarScan processes alignments by scoring and sorting reads, discarding those with low identity or multiple alignments. It then screens for sequence changes and combines variants detected in multiple reads into unique SNPs and indels. For each variant, VarScan determines coverage, supporting reads, base quality, and strand counts. Thresholds for variant calling can be set automatically or manually. In a study using real data from targeted resequencing of 1000 PCR amplicons on Roche/454 and Illumina/Solexa platforms, VarScan demonstrated high sensitivity and specificity. It detected 344 of 359 SNPs in 454 data, with 95.82% accuracy, and 349 of 359 SNPs confirmed in Illumina data or present in dbSNP. VarScan also detected 46 of 77 high-confidence small indels in Illumina data. VarScan's ability to detect variants in pooled samples is particularly useful for low-frequency variants. It can detect variants at 1% frequency, which is advantageous for pooled sequencing. Comparisons between 454/Illumina data and dbSNP suggest 97% specificity in individual 454 data and 93% sensitivity in pooled Illumina data. VarScan is a platform-independent tool that can be expanded to accommodate additional aligner outputs and data types. It is an open-source, modular tool that continues to evolve with new sequencing technologies and data processing algorithms. The tool is recommended for large-scale targeted studies of genetic variation by deep resequencing.
Reach us at info@study.space