QGFN: Controllable Greediness with Action Values

QGFN: Controllable Greediness with Action Values

23 May 2024 | Elaine Lau, Stephen Zhenwen Lu, Ling Pan, Doina Precup, Emmanuel Bengio
Generative Flow Networks (GFNs) are a family of energy-based methods for generating combinatorial objects, capable of producing diverse and high-utility samples. However, biasing GFNs to produce high-utility samples consistently is challenging. This paper leverages connections between GFNs and reinforcement learning (RL) to propose QGFN, which combines a GFN policy with an action-value estimate, $Q$, to create greedier sampling policies that can be controlled by a mixing parameter, $p$. The authors introduce three variants of QGFN: $p$-greedy, $p$-quantile, and $p$-of-max, which are evaluated on various tasks, including molecular design, RNA design, and bit sequence generation. The results show that QGFN variants improve the number of high-reward samples while maintaining diversity, outperforming strong baselines. The paper also discusses the trade-offs between greediness and diversity, and provides insights into why QGFN works, including the role of the action-value function in guiding the agent towards high-reward branches.Generative Flow Networks (GFNs) are a family of energy-based methods for generating combinatorial objects, capable of producing diverse and high-utility samples. However, biasing GFNs to produce high-utility samples consistently is challenging. This paper leverages connections between GFNs and reinforcement learning (RL) to propose QGFN, which combines a GFN policy with an action-value estimate, $Q$, to create greedier sampling policies that can be controlled by a mixing parameter, $p$. The authors introduce three variants of QGFN: $p$-greedy, $p$-quantile, and $p$-of-max, which are evaluated on various tasks, including molecular design, RNA design, and bit sequence generation. The results show that QGFN variants improve the number of high-reward samples while maintaining diversity, outperforming strong baselines. The paper also discusses the trade-offs between greediness and diversity, and provides insights into why QGFN works, including the role of the action-value function in guiding the agent towards high-reward branches.
Reach us at info@study.space