Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis

Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis

8 Jun 2024 | Zanlin Ni, Yulin Wang, Renping Zhou, Jiayi Guo, Jinyi Hu, Zhiyuan Liu, Shiji Song, Yuan Yao, Gao Huang
This paper revisits non-autoregressive Transformers (NATs) for efficient image synthesis, aiming to enhance their performance by optimizing training and generation strategies. NATs are known for their speed and parallel decoding mechanism, but their performance lags behind diffusion models. The authors propose AutoNAT, an automatic method that optimizes both training and generation strategies without relying on heuristic rules. AutoNAT achieves competitive performance with diffusion models while significantly reducing inference cost. The method is validated on four benchmark datasets: ImageNet-256, ImageNet-512, MS-COCO, and CC3M. Results show that AutoNAT outperforms previous NATs and achieves a 5× speedup compared to diffusion models with fast samplers. The paper also discusses the limitations of NATs, including the need for careful scheduling functions and the sub-optimality of heuristic-driven strategies. The proposed alternating optimization algorithm efficiently solves the training and generation strategies, leading to improved performance. The effectiveness of AutoNAT is demonstrated through extensive experiments, showing its superiority in both generation quality and computational efficiency.This paper revisits non-autoregressive Transformers (NATs) for efficient image synthesis, aiming to enhance their performance by optimizing training and generation strategies. NATs are known for their speed and parallel decoding mechanism, but their performance lags behind diffusion models. The authors propose AutoNAT, an automatic method that optimizes both training and generation strategies without relying on heuristic rules. AutoNAT achieves competitive performance with diffusion models while significantly reducing inference cost. The method is validated on four benchmark datasets: ImageNet-256, ImageNet-512, MS-COCO, and CC3M. Results show that AutoNAT outperforms previous NATs and achieves a 5× speedup compared to diffusion models with fast samplers. The paper also discusses the limitations of NATs, including the need for careful scheduling functions and the sub-optimality of heuristic-driven strategies. The proposed alternating optimization algorithm efficiently solves the training and generation strategies, leading to improved performance. The effectiveness of AutoNAT is demonstrated through extensive experiments, showing its superiority in both generation quality and computational efficiency.
Reach us at info@study.space
Understanding Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis