[slides] Deep Reinforcement Learning for One-Warehouse Multi-Retailer inventory management

This paper explores the application of Deep Reinforcement Learning (DRL) to the One-Warehouse Multi-Retailer (OWMR) inventory management system, a prototypical distribution and inventory system. The OWMR system involves a warehouse that pools inventory from multiple retailers, with decisions made on the amount to order and how to distribute it. Traditional approaches often rely on heuristic policies, which can be time-consuming and less effective for complex problem variants. DRL, a general-purpose technique for sequential decision-making, has shown promise in various inventory control problems. However, applying DRL to OWMR systems is challenging due to the exponential growth of possible allocations and the need for a neural network output that can handle multiple discrete actions. The authors propose a DRL algorithm that infers a multi-discrete action distribution, where the output nodes grow linearly with the number of retailers. This approach reduces the computational complexity and allows for simultaneous decision-making across all nodes. Additionally, a random rationing policy is introduced to improve the learning of feasible retailer order quantities when total retailer orders exceed available warehouse inventory. The resulting algorithm outperforms general-purpose benchmark policies by 1–3% for lost sales and 12–20% for partial back-ordering cases, but performs similarly to benchmarks for complete back-ordering. The paper includes a detailed formulation of the OWMR system as a Markov Decision Process (MDP) and discusses the use of Proximal Policy Optimization (PPO) for training the neural network. Numerical experiments are conducted on 14 instances with up to 11 stock points, covering various customer behavior models and problem variants. The results demonstrate the effectiveness of the proposed DRL approach, highlighting its potential for improving inventory management in multi-echelon supply chains.This paper explores the application of Deep Reinforcement Learning (DRL) to the One-Warehouse Multi-Retailer (OWMR) inventory management system, a prototypical distribution and inventory system. The OWMR system involves a warehouse that pools inventory from multiple retailers, with decisions made on the amount to order and how to distribute it. Traditional approaches often rely on heuristic policies, which can be time-consuming and less effective for complex problem variants. DRL, a general-purpose technique for sequential decision-making, has shown promise in various inventory control problems. However, applying DRL to OWMR systems is challenging due to the exponential growth of possible allocations and the need for a neural network output that can handle multiple discrete actions. The authors propose a DRL algorithm that infers a multi-discrete action distribution, where the output nodes grow linearly with the number of retailers. This approach reduces the computational complexity and allows for simultaneous decision-making across all nodes. Additionally, a random rationing policy is introduced to improve the learning of feasible retailer order quantities when total retailer orders exceed available warehouse inventory. The resulting algorithm outperforms general-purpose benchmark policies by 1–3% for lost sales and 12–20% for partial back-ordering cases, but performs similarly to benchmarks for complete back-ordering. The paper includes a detailed formulation of the OWMR system as a Markov Decision Process (MDP) and discusses the use of Proximal Policy Optimization (PPO) for training the neural network. Numerical experiments are conducted on 14 instances with up to 11 stock points, covering various customer behavior models and problem variants. The results demonstrate the effectiveness of the proposed DRL approach, highlighting its potential for improving inventory management in multi-echelon supply chains.

Deep Reinforcement Learning for One-Warehouse Multi-Retailer inventory management

01/01/2024 | Ilya Kaynov, Marijn van Knippenberg, Vlado Menkovski, Albert van Breemen, Willem van Jaarsveld