Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models

Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models

June 4, 2024 | Masatoshi Uehara * 1, Yulai Zhao *2, Ehsan Hajiramezanali1, Gabriele Scalia1, Gökcen Eraslan1, Avantika Lal 1, Sergey Levine †3, and Tommaso Biancalani † 1
The paper "Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models" addresses the challenge of combining generative modeling and model-based optimization (MBO) for AI-driven design problems, such as DNA/protein sequence design. The authors propose a hybrid method that fine-tunes diffusion models by optimizing reward models through reinforcement learning (RL). Unlike prior work that focuses on scenarios with accurate reward models, this paper addresses an offline setting where a reward model is unknown and must be learned from static offline datasets, a common scenario in scientific domains. To prevent overoptimization, the authors introduce a conservative fine-tuning approach called BRAID (douBly conseRvAtive fine-tuning diffusion moDels), which includes additional penalization outside of offline data distributions. The method is evaluated through empirical and theoretical analysis, demonstrating its effectiveness in outperforming the best designs in offline data while avoiding the generation of invalid designs. The paper also provides theoretical guarantees for the regret of the proposed approach, showing that it can leverage the extrapolation capabilities of reward models to generate high-quality designs.The paper "Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models" addresses the challenge of combining generative modeling and model-based optimization (MBO) for AI-driven design problems, such as DNA/protein sequence design. The authors propose a hybrid method that fine-tunes diffusion models by optimizing reward models through reinforcement learning (RL). Unlike prior work that focuses on scenarios with accurate reward models, this paper addresses an offline setting where a reward model is unknown and must be learned from static offline datasets, a common scenario in scientific domains. To prevent overoptimization, the authors introduce a conservative fine-tuning approach called BRAID (douBly conseRvAtive fine-tuning diffusion moDels), which includes additional penalization outside of offline data distributions. The method is evaluated through empirical and theoretical analysis, demonstrating its effectiveness in outperforming the best designs in offline data while avoiding the generation of invalid designs. The paper also provides theoretical guarantees for the regret of the proposed approach, showing that it can leverage the extrapolation capabilities of reward models to generate high-quality designs.
Reach us at info@study.space