11 Jun 2024 | Yue Zhao, Yuanjun Xiong*, Philipp Krähenbühl
The paper introduces a new transformer-based image and video tokenizer called Binary Spherical Quantization (BSQ). BSQ projects high-dimensional visual embeddings into a lower-dimensional hypersphere and then applies binary quantization, achieving parameter efficiency, scalability, and compactness. The tokenizer uses a transformer encoder and decoder with block-wise causal masking to support variable-length videos. The resulting BSQ-ViT model achieves state-of-the-art visual reconstruction quality on image and video benchmarks, with 2.4× faster throughput compared to the best prior methods. Additionally, BSQ-ViT performs well in video compression and image synthesis, achieving comparable results to state-of-the-art video compression standards and GAN- and diffusion-based methods. The paper also discusses related work, including visual tokenization, video tokenization, neural compression, and video compression, and provides a detailed explanation of the BSQ mechanism and its advantages over other quantization methods.The paper introduces a new transformer-based image and video tokenizer called Binary Spherical Quantization (BSQ). BSQ projects high-dimensional visual embeddings into a lower-dimensional hypersphere and then applies binary quantization, achieving parameter efficiency, scalability, and compactness. The tokenizer uses a transformer encoder and decoder with block-wise causal masking to support variable-length videos. The resulting BSQ-ViT model achieves state-of-the-art visual reconstruction quality on image and video benchmarks, with 2.4× faster throughput compared to the best prior methods. Additionally, BSQ-ViT performs well in video compression and image synthesis, achieving comparable results to state-of-the-art video compression standards and GAN- and diffusion-based methods. The paper also discusses related work, including visual tokenization, video tokenization, neural compression, and video compression, and provides a detailed explanation of the BSQ mechanism and its advantages over other quantization methods.