12 Jan 2017 | Irwan Bello*, Hieu Pham*, Quoc V. Le, Mohammad Norouzi, Samy Bengio
This paper introduces Neural Combinatorial Optimization (NCO), a framework for solving combinatorial optimization problems using reinforcement learning (RL) and neural networks. The method is applied to the Traveling Salesman Problem (TSP) and the Knapsack problem. The approach involves training a recurrent neural network (RNN) to predict permutations of cities, with the reward signal being the negative tour length. The model is optimized using policy gradient methods, and two approaches are considered: RL pretraining and active search. RL pretraining uses a training set to optimize the RNN, while active search iteratively refines the model on a single test instance. The method achieves near-optimal results on 2D Euclidean graphs with up to 100 nodes and solves Knapsack instances with up to 200 items optimally. The paper also discusses the generalization of the approach to other combinatorial problems, including the Knapsack problem, and highlights the effectiveness of the method in comparison to traditional heuristics and exact solvers. The results demonstrate that NCO outperforms supervised learning and provides competitive solutions for complex optimization tasks.This paper introduces Neural Combinatorial Optimization (NCO), a framework for solving combinatorial optimization problems using reinforcement learning (RL) and neural networks. The method is applied to the Traveling Salesman Problem (TSP) and the Knapsack problem. The approach involves training a recurrent neural network (RNN) to predict permutations of cities, with the reward signal being the negative tour length. The model is optimized using policy gradient methods, and two approaches are considered: RL pretraining and active search. RL pretraining uses a training set to optimize the RNN, while active search iteratively refines the model on a single test instance. The method achieves near-optimal results on 2D Euclidean graphs with up to 100 nodes and solves Knapsack instances with up to 200 items optimally. The paper also discusses the generalization of the approach to other combinatorial problems, including the Knapsack problem, and highlights the effectiveness of the method in comparison to traditional heuristics and exact solvers. The results demonstrate that NCO outperforms supervised learning and provides competitive solutions for complex optimization tasks.