23 May 2024 | Elaine Lau, Stephen Zhewen Lu, Ling Pan, Doina Precup, Emmanuel Bengio
QGFN: Controllable Greediness with Action Values
This paper introduces QGFN, a method that combines Generative Flow Networks (GFNs) with action-value functions (Q) to create controllable sampling policies. GFNs are energy-based generative models that can generate diverse and high-utility samples. However, biasing GFNs towards high-utility samples is challenging. QGFN leverages the connection between GFNs and reinforcement learning (RL) by combining the GFN policy with an action-value estimate, Q, to create greedier sampling policies that can be controlled by a mixing parameter.
The key idea of QGFN is to use the action-value function Q to guide the sampling process, allowing for more greedy sampling while maintaining diversity. Three variants of QGFN are introduced: p-greedy, p-quantile, and p-of-max. These variants allow for different levels of greediness and diversity in the generated samples.
The paper evaluates QGFN on five standard tasks used in prior GFN works, including molecular design, RNA design, and bit sequence tasks. The results show that QGFN outperforms strong baselines, achieving high average rewards and discovering modes more efficiently. The method is able to generate more high-reward yet diverse object sets by combining the strengths of GFNs and Q-learning.
The paper also discusses the importance of the temperature parameter in GFNs and how it affects the balance between greediness and diversity. It shows that QGFN can adjust the greediness of the policy at inference time without requiring retraining, making it a flexible and effective method for generating diverse and high-reward samples. The results demonstrate that QGFN provides a favorable trade-off between reward and diversity during both training and inference.QGFN: Controllable Greediness with Action Values
This paper introduces QGFN, a method that combines Generative Flow Networks (GFNs) with action-value functions (Q) to create controllable sampling policies. GFNs are energy-based generative models that can generate diverse and high-utility samples. However, biasing GFNs towards high-utility samples is challenging. QGFN leverages the connection between GFNs and reinforcement learning (RL) by combining the GFN policy with an action-value estimate, Q, to create greedier sampling policies that can be controlled by a mixing parameter.
The key idea of QGFN is to use the action-value function Q to guide the sampling process, allowing for more greedy sampling while maintaining diversity. Three variants of QGFN are introduced: p-greedy, p-quantile, and p-of-max. These variants allow for different levels of greediness and diversity in the generated samples.
The paper evaluates QGFN on five standard tasks used in prior GFN works, including molecular design, RNA design, and bit sequence tasks. The results show that QGFN outperforms strong baselines, achieving high average rewards and discovering modes more efficiently. The method is able to generate more high-reward yet diverse object sets by combining the strengths of GFNs and Q-learning.
The paper also discusses the importance of the temperature parameter in GFNs and how it affects the balance between greediness and diversity. It shows that QGFN can adjust the greediness of the policy at inference time without requiring retraining, making it a flexible and effective method for generating diverse and high-reward samples. The results demonstrate that QGFN provides a favorable trade-off between reward and diversity during both training and inference.