Ant Colony Sampling with GFlowNets for Combinatorial Optimization

Ant Colony Sampling with GFlowNets for Combinatorial Optimization

22 May 2024 | Minsu Kim, Sanghyeok Choi, Hyeonah Kim, Jiwoo Son, Jinkyoo Park, Yoshua Bengio
This paper introduces the Generative Flow Ant Colony Sampler (GFACS), a neural-guided probabilistic search algorithm for solving combinatorial optimization (CO) problems. GFACS integrates generative flow networks (GFlowNets), an emerging amortized inference method, with ant colony optimization (ACO), a promising probabilistic search algorithm. Specifically, GFlowNets are used to learn a constructive policy in combinatorial spaces to enhance ACO by providing an informed prior distribution over decision variables conditioned on input graph instances. Additionally, a novel off-policy training algorithm is introduced to scale conditional GFlowNets into large-scale combinatorial spaces by leveraging local search and shared energy normalization. Experimental results show that GFACS outperforms baseline ACO algorithms in seven CO tasks and is competitive with problem-specific heuristics for vehicle routing problems. ACO is a general-purpose meta-heuristic algorithm for solving CO tasks. It employs a constructive solution generation scheme that sequentially adds elements to a partial solution. ACO benefits from its robustness against local optima and flexible solution generation scheme. However, the main challenge with ACO is designing effective prior distributions (heuristic matrix) for their probability models. Recent advancements, known as DeepACO, address this by using policy gradient reinforcement learning (RL) to pretrain the priors conditioned on given problem instances. However, DeepACO has limitations, including the on-policy nature of the algorithm, the reward maximizing form of RL, and the inherent symmetry in combinatorial solutions. GFACS addresses these limitations by integrating GFlowNets, which allow for off-policy training and better modeling of the multimodal nature of certain distributions. GFlowNets model solutions as a generative process on a directed acyclic graph (DAG), which can represent multiple trajectories that share identical states. This approach allows for accurate probabilistic modeling by considering the symmetry of combinatorial solutions. GFACS is trained using an off-policy training method that leverages local search operators and shared energy normalization. This method enables the algorithm to scale to large-scale combinatorial spaces and improve the training stability of conditional GFlowNets. Comparative experiments show that GFACS outperforms DeepACO and classical ACO algorithms for various CO tasks, including vehicle routing problems, scheduling problems, and a grouping problem. Additionally, GFACS outperforms or shows similar performance to competitive unsupervised learning-based vehicle routing solvers for large-scale instances. The method is also compared with previous GFlowNet methods for solving combinatorial optimization problems, demonstrating its effectiveness in addressing challenges related to credit assignment.This paper introduces the Generative Flow Ant Colony Sampler (GFACS), a neural-guided probabilistic search algorithm for solving combinatorial optimization (CO) problems. GFACS integrates generative flow networks (GFlowNets), an emerging amortized inference method, with ant colony optimization (ACO), a promising probabilistic search algorithm. Specifically, GFlowNets are used to learn a constructive policy in combinatorial spaces to enhance ACO by providing an informed prior distribution over decision variables conditioned on input graph instances. Additionally, a novel off-policy training algorithm is introduced to scale conditional GFlowNets into large-scale combinatorial spaces by leveraging local search and shared energy normalization. Experimental results show that GFACS outperforms baseline ACO algorithms in seven CO tasks and is competitive with problem-specific heuristics for vehicle routing problems. ACO is a general-purpose meta-heuristic algorithm for solving CO tasks. It employs a constructive solution generation scheme that sequentially adds elements to a partial solution. ACO benefits from its robustness against local optima and flexible solution generation scheme. However, the main challenge with ACO is designing effective prior distributions (heuristic matrix) for their probability models. Recent advancements, known as DeepACO, address this by using policy gradient reinforcement learning (RL) to pretrain the priors conditioned on given problem instances. However, DeepACO has limitations, including the on-policy nature of the algorithm, the reward maximizing form of RL, and the inherent symmetry in combinatorial solutions. GFACS addresses these limitations by integrating GFlowNets, which allow for off-policy training and better modeling of the multimodal nature of certain distributions. GFlowNets model solutions as a generative process on a directed acyclic graph (DAG), which can represent multiple trajectories that share identical states. This approach allows for accurate probabilistic modeling by considering the symmetry of combinatorial solutions. GFACS is trained using an off-policy training method that leverages local search operators and shared energy normalization. This method enables the algorithm to scale to large-scale combinatorial spaces and improve the training stability of conditional GFlowNets. Comparative experiments show that GFACS outperforms DeepACO and classical ACO algorithms for various CO tasks, including vehicle routing problems, scheduling problems, and a grouping problem. Additionally, GFACS outperforms or shows similar performance to competitive unsupervised learning-based vehicle routing solvers for large-scale instances. The method is also compared with previous GFlowNet methods for solving combinatorial optimization problems, demonstrating its effectiveness in addressing challenges related to credit assignment.
Reach us at info@study.space