KenLM is a library that implements two data structures for efficient language model queries, reducing both time and memory costs. The PROBING data structure uses linear probing hash tables and is designed for speed. Compared to SRILM, it is 2.4 times faster and uses 57% less memory. The TRIE data structure uses bit-level packing, sorted records, interpolation search, and optional quantization to reduce memory consumption. TRIE uses less memory than the smallest lossless baseline and less CPU than the fastest baseline. KenLM is open-source, thread-safe, and integrated into the Moses, cdec, and Joshua translation systems. The paper describes performance techniques and presents benchmarks against alternative implementations.
The PROBING data structure is a hash table-based implementation for storing N-gram language models. It uses an array for unigram lookup and hash tables for n-grams (2 ≤ n ≤ N). Vocabulary lookup is a hash table mapping words to vocabulary indices. The TRIE data structure is a trie with bit-level packing, sorted records, and interpolation search. It stores n-grams in sorted arrays and uses bit-level packing to minimize memory. TRIE is more memory-efficient than SRILM and other lossless alternatives. Quantization is used to reduce memory consumption at the expense of accuracy.
KenLM outperforms other language model implementations in query speed and memory consumption. It is integrated into the Moses, cdec, and Joshua translation systems. The code is open-source and has minimal dependencies. The paper describes several optimizations, including minimizing state, storing backoff in state, and threading. KenLM is faster than other implementations, especially for large models. It is also more memory-efficient than other lossless alternatives. The paper presents benchmarks showing that KenLM is significantly faster than other implementations. It is also more memory-efficient than other lossless alternatives. The paper concludes that KenLM is a fast and memory-efficient language model implementation.KenLM is a library that implements two data structures for efficient language model queries, reducing both time and memory costs. The PROBING data structure uses linear probing hash tables and is designed for speed. Compared to SRILM, it is 2.4 times faster and uses 57% less memory. The TRIE data structure uses bit-level packing, sorted records, interpolation search, and optional quantization to reduce memory consumption. TRIE uses less memory than the smallest lossless baseline and less CPU than the fastest baseline. KenLM is open-source, thread-safe, and integrated into the Moses, cdec, and Joshua translation systems. The paper describes performance techniques and presents benchmarks against alternative implementations.
The PROBING data structure is a hash table-based implementation for storing N-gram language models. It uses an array for unigram lookup and hash tables for n-grams (2 ≤ n ≤ N). Vocabulary lookup is a hash table mapping words to vocabulary indices. The TRIE data structure is a trie with bit-level packing, sorted records, and interpolation search. It stores n-grams in sorted arrays and uses bit-level packing to minimize memory. TRIE is more memory-efficient than SRILM and other lossless alternatives. Quantization is used to reduce memory consumption at the expense of accuracy.
KenLM outperforms other language model implementations in query speed and memory consumption. It is integrated into the Moses, cdec, and Joshua translation systems. The code is open-source and has minimal dependencies. The paper describes several optimizations, including minimizing state, storing backoff in state, and threading. KenLM is faster than other implementations, especially for large models. It is also more memory-efficient than other lossless alternatives. The paper presents benchmarks showing that KenLM is significantly faster than other implementations. It is also more memory-efficient than other lossless alternatives. The paper concludes that KenLM is a fast and memory-efficient language model implementation.