2024 | Illya Kaynov, Marijn van Knippenberg, Vlado Menkovski, Albert van Breemen, Willem van Jaarsveld
Deep Reinforcement Learning (DRL) is applied to the One-Warehouse Multi-Retailer (OWMR) inventory management problem, which involves managing inventory in a supply chain with a single warehouse and multiple retailers. The study evaluates DRL's performance against benchmark policies for three customer behavior models: lost sales, complete back-ordering, and partial back-ordering. The OWMR system is a multi-echelon inventory system where inventory is pooled at the warehouse and distributed to retailers. The challenge lies in allocating orders to retailers while considering inventory constraints and varying demand patterns.
The study formulates the OWMR problem as a Markov Decision Process (MDP) and applies the Proximal Policy Optimization (PPO) algorithm for training a DRL agent. The DRL agent is designed to handle multi-discrete action distributions, which allows for efficient learning of policies that allocate orders to retailers. The agent uses a randomized sequential allocation rule to ensure feasible order quantities, which improves learning efficiency.
The study evaluates the performance of the DRL algorithm across 14 problem instances with varying numbers of retailers, demand distributions, and lead times. The results show that the DRL algorithm outperforms benchmark policies by 1-3% for the lost sales case and by 12-20% for the partial back-ordering case. However, for the complete back-ordering case, the DRL algorithm does not consistently outperform the benchmark.
The study highlights the potential of DRL for solving complex inventory problems, particularly in multi-echelon systems where traditional heuristic policies are difficult to construct. The approach is generalizable and can be applied to a wide range of inventory management problems. The results demonstrate that DRL can provide significant performance improvements, especially for model variants where constructing appropriate heuristics is challenging. The developed approach may be applicable to multi-echelon supply chains beyond the OWMR problem.Deep Reinforcement Learning (DRL) is applied to the One-Warehouse Multi-Retailer (OWMR) inventory management problem, which involves managing inventory in a supply chain with a single warehouse and multiple retailers. The study evaluates DRL's performance against benchmark policies for three customer behavior models: lost sales, complete back-ordering, and partial back-ordering. The OWMR system is a multi-echelon inventory system where inventory is pooled at the warehouse and distributed to retailers. The challenge lies in allocating orders to retailers while considering inventory constraints and varying demand patterns.
The study formulates the OWMR problem as a Markov Decision Process (MDP) and applies the Proximal Policy Optimization (PPO) algorithm for training a DRL agent. The DRL agent is designed to handle multi-discrete action distributions, which allows for efficient learning of policies that allocate orders to retailers. The agent uses a randomized sequential allocation rule to ensure feasible order quantities, which improves learning efficiency.
The study evaluates the performance of the DRL algorithm across 14 problem instances with varying numbers of retailers, demand distributions, and lead times. The results show that the DRL algorithm outperforms benchmark policies by 1-3% for the lost sales case and by 12-20% for the partial back-ordering case. However, for the complete back-ordering case, the DRL algorithm does not consistently outperform the benchmark.
The study highlights the potential of DRL for solving complex inventory problems, particularly in multi-echelon systems where traditional heuristic policies are difficult to construct. The approach is generalizable and can be applied to a wide range of inventory management problems. The results demonstrate that DRL can provide significant performance improvements, especially for model variants where constructing appropriate heuristics is challenging. The developed approach may be applicable to multi-echelon supply chains beyond the OWMR problem.