ControlVAR: Exploring Controllable Visual Autoregressive Modeling

ControlVAR: Exploring Controllable Visual Autoregressive Modeling

14 Jun 2024 | Xiang Li; Kai Qiu; Hao Chen; Jason Kuen; Zhe Lin; Rita Singh; Bhiksha Raj
ControlVAR is a novel framework for controllable visual autoregressive modeling that enables flexible and efficient conditional generation. Unlike traditional conditional models that learn the conditional distribution, ControlVAR jointly models the distribution of image and pixel-level conditions during training and imposes conditional controls during testing. It introduces a new conditional autoregressive paradigm that allows for highly flexible conditional image generation by jointly modeling the control and image. ControlVAR unifies image and control representations and reformulates the conditional generation process to jointly model the image and control during training. During inference, it introduces teacher-forcing guidance (TFG) to facilitate controllable sampling. ControlVAR outperforms powerful diffusion models like ControlNet and T2I-Adapter in controlled image generation across various pixel-level controls. The framework supports multiple conditional generation tasks, including joint control-image generation, control/image completion, control-to-image generation, and image-to-control generation. It also demonstrates capabilities for unseen tasks, such as control-to-control generation, enhancing its flexibility and versatility. ControlVAR is evaluated on the ImageNet dataset, showing superior performance in terms of image quality and generation speed compared to existing methods. The framework is effective in handling complex tasks like image inpainting and image-to-control prediction. ControlVAR's ability to incorporate additional controls into the autoregressive modeling process makes it a promising approach for controllable image generation.ControlVAR is a novel framework for controllable visual autoregressive modeling that enables flexible and efficient conditional generation. Unlike traditional conditional models that learn the conditional distribution, ControlVAR jointly models the distribution of image and pixel-level conditions during training and imposes conditional controls during testing. It introduces a new conditional autoregressive paradigm that allows for highly flexible conditional image generation by jointly modeling the control and image. ControlVAR unifies image and control representations and reformulates the conditional generation process to jointly model the image and control during training. During inference, it introduces teacher-forcing guidance (TFG) to facilitate controllable sampling. ControlVAR outperforms powerful diffusion models like ControlNet and T2I-Adapter in controlled image generation across various pixel-level controls. The framework supports multiple conditional generation tasks, including joint control-image generation, control/image completion, control-to-image generation, and image-to-control generation. It also demonstrates capabilities for unseen tasks, such as control-to-control generation, enhancing its flexibility and versatility. ControlVAR is evaluated on the ImageNet dataset, showing superior performance in terms of image quality and generation speed compared to existing methods. The framework is effective in handling complex tasks like image inpainting and image-to-control prediction. ControlVAR's ability to incorporate additional controls into the autoregressive modeling process makes it a promising approach for controllable image generation.
Reach us at info@study.space
[slides] ControlVAR%3A Exploring Controllable Visual Autoregressive Modeling | StudySpace