SOAP: short oligonucleotide alignment program

SOAP: short oligonucleotide alignment program

January 28, 2008 | Ruiqiang Li¹,², Yingrui Li¹, Karsten Kristiansen² and Jun Wang¹,²,*
SOAP is a program for efficient alignment of short oligonucleotides onto reference sequences. It is designed to handle the large volumes of short reads generated by next-generation sequencing technologies, such as Illumina-Solexa. SOAP supports both gapped and ungapped alignments and has special modules for pair-end, small RNA, and mRNA tag sequence alignments. It is a command-driven program that supports multi-threaded parallel computing and has a batch module for multiple query sets. SOAP can allow a certain number of mismatches or one continuous gap for aligning a read onto the reference sequence. The best hit with the minimal number of mismatches or smaller gap is reported. For multiple equal-best hits, the user can instruct the program to report all, or randomly report one, or disregard all of them. SOAP uses seed and hash look-up table algorithms to accelerate alignment. Both reads and reference sequences are converted to numeric data types. The program uses a trade-off between memory usage and efficiency, using unsigned 3-byte data types. SOAP can handle up to two mismatches and allows one continuous gap of 1–3 bp. It can iteratively trim the 3'-end of reads to improve alignment accuracy. SOAP is faster than blastn, with performance improvements up to 1200 times faster for ungapped alignments. It is suitable for various applications, including genome resequencing, small RNA discovery, and mRNA tag profiling. SOAP is written in standard C++ and runs on Macintosh or 64-bit Linux/Unix systems. It supports multithreaded parallel computing and batch processing. The program is supported by the National Natural Science Foundation of China and the Danish Natural Science Research Council.SOAP is a program for efficient alignment of short oligonucleotides onto reference sequences. It is designed to handle the large volumes of short reads generated by next-generation sequencing technologies, such as Illumina-Solexa. SOAP supports both gapped and ungapped alignments and has special modules for pair-end, small RNA, and mRNA tag sequence alignments. It is a command-driven program that supports multi-threaded parallel computing and has a batch module for multiple query sets. SOAP can allow a certain number of mismatches or one continuous gap for aligning a read onto the reference sequence. The best hit with the minimal number of mismatches or smaller gap is reported. For multiple equal-best hits, the user can instruct the program to report all, or randomly report one, or disregard all of them. SOAP uses seed and hash look-up table algorithms to accelerate alignment. Both reads and reference sequences are converted to numeric data types. The program uses a trade-off between memory usage and efficiency, using unsigned 3-byte data types. SOAP can handle up to two mismatches and allows one continuous gap of 1–3 bp. It can iteratively trim the 3'-end of reads to improve alignment accuracy. SOAP is faster than blastn, with performance improvements up to 1200 times faster for ungapped alignments. It is suitable for various applications, including genome resequencing, small RNA discovery, and mRNA tag profiling. SOAP is written in standard C++ and runs on Macintosh or 64-bit Linux/Unix systems. It supports multithreaded parallel computing and batch processing. The program is supported by the National Natural Science Foundation of China and the Danish Natural Science Research Council.
Reach us at info@study.space
Understanding SOAP%3A short oligonucleotide alignment program