Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models

Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models

June 4, 2024 | Masatoshi Uehara, Yulai Zhao, Ehsan Hajiramezanali, Gabriele Scalia, Gökcen Eraslan, Avantika Lal, Sergey Levine, and Tommaso Biancalani
This paper proposes a novel approach called BRAID (doubly conservative fine-tuning diffusion models) to bridge model-based optimization and generative modeling in AI-driven design problems. The method addresses the challenge of overoptimization in offline settings where reward models are unknown and only static offline data is available. BRAID introduces a conservative fine-tuning approach by optimizing a conservative reward model that includes additional penalization outside of offline data distributions. This approach prevents the generation of invalid designs while leveraging the extrapolation capabilities of reward models. The key idea is to combine generative modeling and model-based optimization by fine-tuning diffusion models using a conservative reward model. The method involves two stages: first, training a conservative reward model using offline data with an uncertainty quantification term that assigns higher penalties to out-of-distribution regions. Second, fine-tuning pre-trained diffusion models by optimizing the conservative reward model to obtain high-quality designs and prevent the generation of out-of-distribution designs. A KL penalization term is also introduced to ensure that the generated designs remain within the valid design space. Theoretical analysis shows that the proposed approach outperforms the best designs in offline data by leveraging the extrapolation capabilities of reward models while avoiding the generation of invalid designs. Empirical evaluations across diverse domains, such as DNA/RNA sequences and images, demonstrate the efficacy of the approach. The method is compared with existing approaches for model-based optimization with diffusion models, showing that BRAID consistently outperforms baselines in terms of reward performance. The results indicate that the proposed approach effectively mitigates overoptimization and generates high-quality designs that surpass the best designs in the offline data.This paper proposes a novel approach called BRAID (doubly conservative fine-tuning diffusion models) to bridge model-based optimization and generative modeling in AI-driven design problems. The method addresses the challenge of overoptimization in offline settings where reward models are unknown and only static offline data is available. BRAID introduces a conservative fine-tuning approach by optimizing a conservative reward model that includes additional penalization outside of offline data distributions. This approach prevents the generation of invalid designs while leveraging the extrapolation capabilities of reward models. The key idea is to combine generative modeling and model-based optimization by fine-tuning diffusion models using a conservative reward model. The method involves two stages: first, training a conservative reward model using offline data with an uncertainty quantification term that assigns higher penalties to out-of-distribution regions. Second, fine-tuning pre-trained diffusion models by optimizing the conservative reward model to obtain high-quality designs and prevent the generation of out-of-distribution designs. A KL penalization term is also introduced to ensure that the generated designs remain within the valid design space. Theoretical analysis shows that the proposed approach outperforms the best designs in offline data by leveraging the extrapolation capabilities of reward models while avoiding the generation of invalid designs. Empirical evaluations across diverse domains, such as DNA/RNA sequences and images, demonstrate the efficacy of the approach. The method is compared with existing approaches for model-based optimization with diffusion models, showing that BRAID consistently outperforms baselines in terms of reward performance. The results indicate that the proposed approach effectively mitigates overoptimization and generates high-quality designs that surpass the best designs in the offline data.
Reach us at info@study.space
[slides] Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models | StudySpace