[slides and audio] Reward Guided Latent Consistency Distillation

The paper introduces Reward Guided Latent Consistency Distillation (RG-LCD), a novel method that integrates feedback from a reward model (RM) into the latent consistency distillation (LCD) process. This approach aims to improve the sample quality of latent consistency models (LCMs) while facilitating fast inference. By aligning the LCM's output with human preferences during training, RG-LCD enhances the quality of generated images without compromising efficiency. The method involves training the LCM to maximize the reward associated with its single-step generation, using a differentiable RM. To address the issue of reward over-optimization, the authors propose a latent proxy RM (LRM), which serves as an intermediary between the LCM and the RM. This LRM helps avoid high-frequency noise in the generated images, leading to improved FID scores on MS-COCO and higher HPSv2.1 scores on the HPSv2 test set. The paper demonstrates that RG-LCD achieves a 25 times inference acceleration compared to the teacher LDM (Stable Diffusion v2.1) without sacrificing image quality. Human evaluations and automatic metrics further validate the effectiveness of RG-LCD, showing that it outperforms both standard LCMs and other reward models in terms of sample quality and efficiency.The paper introduces Reward Guided Latent Consistency Distillation (RG-LCD), a novel method that integrates feedback from a reward model (RM) into the latent consistency distillation (LCD) process. This approach aims to improve the sample quality of latent consistency models (LCMs) while facilitating fast inference. By aligning the LCM's output with human preferences during training, RG-LCD enhances the quality of generated images without compromising efficiency. The method involves training the LCM to maximize the reward associated with its single-step generation, using a differentiable RM. To address the issue of reward over-optimization, the authors propose a latent proxy RM (LRM), which serves as an intermediary between the LCM and the RM. This LRM helps avoid high-frequency noise in the generated images, leading to improved FID scores on MS-COCO and higher HPSv2.1 scores on the HPSv2 test set. The paper demonstrates that RG-LCD achieves a 25 times inference acceleration compared to the teacher LDM (Stable Diffusion v2.1) without sacrificing image quality. Human evaluations and automatic metrics further validate the effectiveness of RG-LCD, showing that it outperforms both standard LCMs and other reward models in terms of sample quality and efficiency.

Reward Guided Latent Consistency Distillation

16 Mar 2024 | Jiachen Li, Weixi Feng, Wenhui Chen, William Yang Wang