Understanding Autoregressive Image Generation without Vector Quantization

This paper challenges the conventional wisdom that autoregressive models for image generation must be accompanied by vector-quantized tokens. The authors propose a novel approach that models the per-token probability distribution using a diffusion procedure, allowing autoregressive models to operate in a continuous-valued space without the need for discrete-valued tokenizers. They introduce a *Diffusion Loss* function to measure the per-token probability, which is trained jointly with the autoregressive model. This method eliminates the need for vector quantization, leading to improved generation quality and speed. The authors evaluate their approach across various cases, including standard autoregressive models and generalized masked autoregressive (MAR) variants, demonstrating strong results. They also unify standard autoregressive models and masked generative models into a generalized autoregressive framework. The effectiveness of their method is demonstrated through experiments, showing that it can achieve high FID scores while generating images at a rate of less than 0.3 seconds per image. The paper concludes by highlighting the potential of this approach in advancing autoregressive image generation and its broader applications.This paper challenges the conventional wisdom that autoregressive models for image generation must be accompanied by vector-quantized tokens. The authors propose a novel approach that models the per-token probability distribution using a diffusion procedure, allowing autoregressive models to operate in a continuous-valued space without the need for discrete-valued tokenizers. They introduce a *Diffusion Loss* function to measure the per-token probability, which is trained jointly with the autoregressive model. This method eliminates the need for vector quantization, leading to improved generation quality and speed. The authors evaluate their approach across various cases, including standard autoregressive models and generalized masked autoregressive (MAR) variants, demonstrating strong results. They also unify standard autoregressive models and masked generative models into a generalized autoregressive framework. The effectiveness of their method is demonstrated through experiments, showing that it can achieve high FID scores while generating images at a rate of less than 0.3 seconds per image. The paper concludes by highlighting the potential of this approach in advancing autoregressive image generation and its broader applications.

Autoregressive Image Generation without Vector Quantization

28 Jul 2024 | Tianhong Li, Yonglong Tian, He Li, Mingyang Deng, Kaiming He