Learning Constrained Parametric Differentiable Predictive Control Policies With Guarantees

Learning Constrained Parametric Differentiable Predictive Control Policies With Guarantees

2024 | Ján Drgona, Member, IEEE, Aaron Tuor, and Draguna Vrabie, Member, IEEE
This paper presents a differentiable predictive control (DPC) method for offline learning of constrained neural control policies for nonlinear dynamical systems with performance guarantees. DPC leverages automatic differentiation (AD) to efficiently compute sensitivities of the model predictive control (MPC) objective function and constraints penalties, enabling direct policy gradients for gradient-based optimization. Probabilistic guarantees on closed-loop stability and constraint satisfaction are derived using Hoeffding's inequality. The method is demonstrated to learn neural control policies for various parametric optimal control tasks, including stabilizing systems with unstable dynamics, tracking time-varying references, and satisfying nonlinear state and input constraints. DPC offers practical time savings compared to alternative approaches for fast and memory-efficient controller design, with no dependency on a supervisory controller. It is shown to be more sample efficient than model-free reinforcement learning (RL) algorithms and scales better than explicit MPC while providing probabilistic performance guarantees. The method is implemented in open-source code and evaluated on five numerical studies, demonstrating its effectiveness in learning constrained policies with guarantees. The DPC method is shown to provide faster execution time than implicit MPC and better scalability compared to eMPC. The method is also validated with probabilistic stability and constraint satisfaction guarantees, using performance indicator functions and Hoeffding's inequality. The DPC method is demonstrated to be effective in learning policies for various systems, including unstable double integrator, two-tank system, quadcopter model, and parametric obstacle avoidance problem. The method is shown to provide significant speedups in online computation compared to implicit MPC and has a smaller memory footprint than eMPC. The DPC method is also shown to be scalable and efficient for large-scale systems.This paper presents a differentiable predictive control (DPC) method for offline learning of constrained neural control policies for nonlinear dynamical systems with performance guarantees. DPC leverages automatic differentiation (AD) to efficiently compute sensitivities of the model predictive control (MPC) objective function and constraints penalties, enabling direct policy gradients for gradient-based optimization. Probabilistic guarantees on closed-loop stability and constraint satisfaction are derived using Hoeffding's inequality. The method is demonstrated to learn neural control policies for various parametric optimal control tasks, including stabilizing systems with unstable dynamics, tracking time-varying references, and satisfying nonlinear state and input constraints. DPC offers practical time savings compared to alternative approaches for fast and memory-efficient controller design, with no dependency on a supervisory controller. It is shown to be more sample efficient than model-free reinforcement learning (RL) algorithms and scales better than explicit MPC while providing probabilistic performance guarantees. The method is implemented in open-source code and evaluated on five numerical studies, demonstrating its effectiveness in learning constrained policies with guarantees. The DPC method is shown to provide faster execution time than implicit MPC and better scalability compared to eMPC. The method is also validated with probabilistic stability and constraint satisfaction guarantees, using performance indicator functions and Hoeffding's inequality. The DPC method is demonstrated to be effective in learning policies for various systems, including unstable double integrator, two-tank system, quadcopter model, and parametric obstacle avoidance problem. The method is shown to provide significant speedups in online computation compared to implicit MPC and has a smaller memory footprint than eMPC. The DPC method is also shown to be scalable and efficient for large-scale systems.
Reach us at info@study.space