Video Diffusion Alignment via Reward Gradients

Video Diffusion Alignment via Reward Gradients

11 Jul 2024 | Mihir Prabhudesai*, Russell Mendonca*, Zheyang Qin*, Katerina Fragkiadaki, Deepak Pathak
The paper introduces VADER, a method for adapting foundational video diffusion models to specific tasks using reward gradients. VADER leverages pre-trained reward models, which are learned on top of powerful vision discriminative models, to align video diffusion models with desired task objectives. The approach is efficient in terms of both sample and computational efficiency, as it backpropagates dense gradient information from the reward models to the video diffusion model. The paper demonstrates the effectiveness of VADER across various reward models and video diffusion models, showing improved performance over gradient-free methods like DPO and DDPO. VADER also generalizes well to unseen prompts and achieves high-quality results through human evaluation. The code and model weights are available at https://vader-vid.github.io.The paper introduces VADER, a method for adapting foundational video diffusion models to specific tasks using reward gradients. VADER leverages pre-trained reward models, which are learned on top of powerful vision discriminative models, to align video diffusion models with desired task objectives. The approach is efficient in terms of both sample and computational efficiency, as it backpropagates dense gradient information from the reward models to the video diffusion model. The paper demonstrates the effectiveness of VADER across various reward models and video diffusion models, showing improved performance over gradient-free methods like DPO and DDPO. VADER also generalizes well to unseen prompts and achieves high-quality results through human evaluation. The code and model weights are available at https://vader-vid.github.io.
Reach us at info@study.space
Understanding Video Diffusion Alignment via Reward Gradients