Reinforcement Learning for Solving the Vehicle Routing Problem

Reinforcement Learning for Solving the Vehicle Routing Problem

21 May 2018 | Mohammadreza Nazari, Afshin Oroojlooy, Martin Takáč, Lawrence V. Snyder
This paper presents an end-to-end reinforcement learning (RL) framework for solving the Vehicle Routing Problem (VRP). The framework trains a single model that finds near-optimal solutions for problem instances sampled from a given distribution, using only reward signals and feasibility rules. The model is a parameterized stochastic policy that uses a policy gradient algorithm to optimize its parameters. The trained model produces solutions as a sequence of consecutive actions in real time, without retraining for each new instance. On capacitated VRP, the approach outperforms classical heuristics and Google's OR-Tools in solution quality with comparable computation time after training. The framework can handle split delivery and is applicable to other VRP variants such as the stochastic VRP. The model uses a recurrent neural network (RNN) decoder with an attention mechanism to efficiently handle both static and dynamic elements of the system. The framework is robust to problem changes and does not require an explicit distance matrix. Numerical experiments show that the framework performs significantly better than classical heuristics and is robust in terms of solution quality. The framework is also applicable to other combinatorial optimization problems. The paper compares the proposed framework with the Clarke-Wright savings heuristic, the Sweep heuristic, and Google's OR-Tools. The results show that the framework outperforms these methods in solution quality and computational efficiency. The framework is also extended to other VRP variants such as multiple depots and multiple vehicles. The framework is applicable to real-time services including on-demand deliveries and taxis. The paper concludes that the proposed framework has significant potential for real-world applications and can be applied to other combinatorial optimization problems.This paper presents an end-to-end reinforcement learning (RL) framework for solving the Vehicle Routing Problem (VRP). The framework trains a single model that finds near-optimal solutions for problem instances sampled from a given distribution, using only reward signals and feasibility rules. The model is a parameterized stochastic policy that uses a policy gradient algorithm to optimize its parameters. The trained model produces solutions as a sequence of consecutive actions in real time, without retraining for each new instance. On capacitated VRP, the approach outperforms classical heuristics and Google's OR-Tools in solution quality with comparable computation time after training. The framework can handle split delivery and is applicable to other VRP variants such as the stochastic VRP. The model uses a recurrent neural network (RNN) decoder with an attention mechanism to efficiently handle both static and dynamic elements of the system. The framework is robust to problem changes and does not require an explicit distance matrix. Numerical experiments show that the framework performs significantly better than classical heuristics and is robust in terms of solution quality. The framework is also applicable to other combinatorial optimization problems. The paper compares the proposed framework with the Clarke-Wright savings heuristic, the Sweep heuristic, and Google's OR-Tools. The results show that the framework outperforms these methods in solution quality and computational efficiency. The framework is also extended to other VRP variants such as multiple depots and multiple vehicles. The framework is applicable to real-time services including on-demand deliveries and taxis. The paper concludes that the proposed framework has significant potential for real-world applications and can be applied to other combinatorial optimization problems.
Reach us at info@study.space