A fast, lock-free approach for efficient parallel counting of occurrences of k-mers

A fast, lock-free approach for efficient parallel counting of occurrences of k-mers

January 7, 2011 | Guillaume Marçais and Carl Kingsford
Jellyfish is a fast, lock-free algorithm for efficient parallel counting of k-mer occurrences in DNA sequences. It is designed for shared memory parallel computers with multiple cores and uses a multithreaded, lock-free hash table optimized for counting k-mers up to 31 bases in length. The algorithm is memory efficient and uses key compression and bit-packing to reduce memory usage. It is implemented in C++ and is GPL licensed, available for download at http://www.cbcb.umd.edu/software/jellyfish. Jellyfish is significantly faster and more memory-efficient than existing k-mer counting tools, such as Meryl and Tallymer, and is used in various applications including genome assembly, error correction, repeat detection, and multiple sequence alignment. The algorithm uses a lock-free hash table with a quadratic re-probing function and a key encoding scheme that allows for efficient storage and retrieval of k-mer counts. It also supports fast merging of intermediate hash tables and is capable of processing large datasets in a short time. Jellyfish's performance is demonstrated on various sequencing projects, including the Turkey genome, where it outperforms other tools in terms of speed and memory usage. The algorithm is suitable for large-scale genomic data processing and is widely applicable in bioinformatics.Jellyfish is a fast, lock-free algorithm for efficient parallel counting of k-mer occurrences in DNA sequences. It is designed for shared memory parallel computers with multiple cores and uses a multithreaded, lock-free hash table optimized for counting k-mers up to 31 bases in length. The algorithm is memory efficient and uses key compression and bit-packing to reduce memory usage. It is implemented in C++ and is GPL licensed, available for download at http://www.cbcb.umd.edu/software/jellyfish. Jellyfish is significantly faster and more memory-efficient than existing k-mer counting tools, such as Meryl and Tallymer, and is used in various applications including genome assembly, error correction, repeat detection, and multiple sequence alignment. The algorithm uses a lock-free hash table with a quadratic re-probing function and a key encoding scheme that allows for efficient storage and retrieval of k-mer counts. It also supports fast merging of intermediate hash tables and is capable of processing large datasets in a short time. Jellyfish's performance is demonstrated on various sequencing projects, including the Turkey genome, where it outperforms other tools in terms of speed and memory usage. The algorithm is suitable for large-scale genomic data processing and is widely applicable in bioinformatics.
Reach us at info@study.space
[slides and audio] A fast%2C lock-free approach for efficient parallel counting of occurrences of k-mers