[slides and audio] Residual Quantization with Implicit Neural Codebooks

This paper introduces QINCo, a neural residual vector quantizer that improves upon conventional residual quantization (RQ) by using data-dependent codebooks generated by neural networks. QINCo adapts codebooks at each quantization step based on the distribution of residuals, leading to significant reductions in error rates for compressed vectors. Unlike traditional RQ, which uses fixed codebooks per step, QINCo constructs specialized codebooks that depend on the approximation of the vector from previous steps. This approach allows QINCo to outperform state-of-the-art methods on multiple datasets and code sizes, achieving better nearest-neighbor search accuracy using 12-byte codes than the state-of-the-art UNQ using 16 bytes on the BigANN1M and Deep1M datasets. QINCo is designed to work with inverted file indexing (IVF) and re-ranking techniques for fast approximate decoding, making it suitable for highly accurate large-scale similarity search. The method is stable to train and has few hyperparameters, and its codes can be decoded from the most to the least significant byte, with prefix codes yielding accuracy on par with codes specifically trained for that code length. QINCo is also effective as a multi-rate codec, as it can be used with different code lengths without significant loss in performance. The paper compares QINCo with several baselines, including OPQ, RQ, LSQ, UNQ, RVPQ, and DeepQ, and shows that QINCo significantly outperforms these methods in terms of reconstruction error and nearest-neighbor search accuracy. QINCo is also compared with product quantization (PQ) and additive quantization (AQ) and is shown to have a better trade-off between search speed and accuracy. The method is implemented in Faiss and is shown to be effective for large-scale similarity search, with IVF-QINCo being a fast search pipeline that includes IVF, approximate decoding, and re-ranking with the QINCo decoder. The paper also discusses the scalability of QINCo, showing that it can be trained on large datasets and that its performance improves with more training data. QINCo is also shown to be effective for high-dimensional data, with a variant called QINCo-LR that uses a low-rank projection to reduce the number of trainable parameters while maintaining performance. The paper concludes that QINCo is a flexible and effective method for vector quantization, with potential applications in data compression and similarity search.This paper introduces QINCo, a neural residual vector quantizer that improves upon conventional residual quantization (RQ) by using data-dependent codebooks generated by neural networks. QINCo adapts codebooks at each quantization step based on the distribution of residuals, leading to significant reductions in error rates for compressed vectors. Unlike traditional RQ, which uses fixed codebooks per step, QINCo constructs specialized codebooks that depend on the approximation of the vector from previous steps. This approach allows QINCo to outperform state-of-the-art methods on multiple datasets and code sizes, achieving better nearest-neighbor search accuracy using 12-byte codes than the state-of-the-art UNQ using 16 bytes on the BigANN1M and Deep1M datasets. QINCo is designed to work with inverted file indexing (IVF) and re-ranking techniques for fast approximate decoding, making it suitable for highly accurate large-scale similarity search. The method is stable to train and has few hyperparameters, and its codes can be decoded from the most to the least significant byte, with prefix codes yielding accuracy on par with codes specifically trained for that code length. QINCo is also effective as a multi-rate codec, as it can be used with different code lengths without significant loss in performance. The paper compares QINCo with several baselines, including OPQ, RQ, LSQ, UNQ, RVPQ, and DeepQ, and shows that QINCo significantly outperforms these methods in terms of reconstruction error and nearest-neighbor search accuracy. QINCo is also compared with product quantization (PQ) and additive quantization (AQ) and is shown to have a better trade-off between search speed and accuracy. The method is implemented in Faiss and is shown to be effective for large-scale similarity search, with IVF-QINCo being a fast search pipeline that includes IVF, approximate decoding, and re-ranking with the QINCo decoder. The paper also discusses the scalability of QINCo, showing that it can be trained on large datasets and that its performance improves with more training data. QINCo is also shown to be effective for high-dimensional data, with a variant called QINCo-LR that uses a low-rank projection to reduce the number of trainable parameters while maintaining performance. The paper concludes that QINCo is a flexible and effective method for vector quantization, with potential applications in data compression and similarity search.

Residual Quantization with Implicit Neural Codebooks

May 22, 2024 | Iris A. M. Huijben, Matthijs Douze, Matthew Muckley, Ruud J. G. van Sloun, Jakob Verbeek